Stick-breaking Construction for the Indian Buffet Process Duke University Machine Learning Group Presented by Kai Ni July 27, 2007 Yee Whye The, Dilan.

Slides:



Advertisements
Similar presentations
Generalised linear mixed models in WinBUGS
Advertisements

Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei
Xiaolong Wang and Daniel Khashabi
Markov Chain Sampling Methods for Dirichlet Process Mixture Models R.M. Neal Summarized by Joon Shik Kim (Thu) Computational Models of Intelligence.
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Slice Sampling Radford M. Neal The Annals of Statistics (Vol. 31, No. 3, 2003)
Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.
Hierarchical Dirichlet Processes
Bayesian dynamic modeling of latent trait distributions Duke University Machine Learning Group Presented by Kai Ni Jan. 25, 2007 Paper by David B. Dunson,
Combining Information from Related Regressions Duke University Machine Learning Group Presented by Kai Ni Apr. 27, 2007 F. Dominici, G. Parmigiani, K.
Nonparametric hidden Markov models Jurgen Van Gael and Zoubin Ghahramani.
HW 4. Nonparametric Bayesian Models Parametric Model Fixed number of parameters that is independent of the data we’re fitting.
Learning Scalable Discriminative Dictionaries with Sample Relatedness a.k.a. “Infinite Attributes” Jiashi Feng, Stefanie Jegelka, Shuicheng Yan, Trevor.
Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.
Variational Inference for Dirichlet Process Mixture Daniel Klein and Soravit Beer Changpinyo October 11, 2011 Applied Bayesian Nonparametrics Special Topics.
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
Nonparametric Bayes and human cognition Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
Adaptive Rao-Blackwellized Particle Filter and It’s Evaluation for Tracking in Surveillance Xinyu Xu and Baoxin Li, Senior Member, IEEE.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Introduction to Monte Carlo Methods D.J.C. Mackay.
Learning the structure of Deep sparse Graphical Model Ryan Prescott Adams Hanna M Wallach Zoubin Ghahramani Presented by Zhengming Xing Some pictures are.
Fast Max–Margin Matrix Factorization with Data Augmentation Minjie Xu, Jun Zhu & Bo Zhang Tsinghua University.
Hierarchical Dirichelet Processes Y. W. Tech, M. I. Jordan, M. J. Beal & D. M. Blei NIPS 2004 Presented by Yuting Qi ECE Dept., Duke Univ. 08/26/05 Sharing.
Bayesian Sets Zoubin Ghahramani and Kathertine A. Heller NIPS 2005 Presented by Qi An Mar. 17 th, 2006.
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.
We will use Gauss-Jordan elimination to determine the solution set of this linear system.
Memory Bounded Inference on Topic Models Paper by R. Gomes, M. Welling, and P. Perona Included in Proceedings of ICML 2008 Presentation by Eric Wang 1/9/2009.
Chih-Ming Chen, Student Member, IEEE, Ying-ping Chen, Member, IEEE, Tzu-Ching Shen, and John K. Zao, Senior Member, IEEE Evolutionary Computation (CEC),
Simulation of the matrix Bingham-von Mises- Fisher distribution, with applications to multivariate and relational data Discussion led by Chunping Wang.
The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.
Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.
Variational Inference for the Indian Buffet Process
Bayesian Multivariate Logistic Regression by Sean O’Brien and David Dunson (Biometrics, 2004 ) Presented by Lihan He ECE, Duke University May 16, 2008.
Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.
Stick-Breaking Constructions
The Infinite Hierarchical Factor Regression Model Piyush Rai and Hal Daume III NIPS 2008 Presented by Bo Chen March 26, 2009.
Statistical Models for Partial Membership Katherine Heller Gatsby Computational Neuroscience Unit, UCL Sinead Williamson and Zoubin Ghahramani University.
Beam Sampling for the Infinite Hidden Markov Model by Jurgen Van Gael, Yunus Saatic, Yee Whye Teh and Zoubin Ghahramani (ICML 2008) Presented by Lihan.
by Ryan P. Adams, Iain Murray, and David J.C. MacKay (ICML 2009)
Generalized Spatial Dirichlet Process Models Jason A. Duan Michele Guindani Alan E. Gelfand March, 2006.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
Nonparametric Bayesian Models. HW 4 x x Parametric Model Fixed number of parameters that is independent of the data we’re fitting.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
The Phylogenetic Indian Buffet Process : A Non- Exchangeable Nonparametric Prior for Latent Features By: Kurt T. Miller, Thomas L. Griffiths and Michael.
Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:
Introduction: Metropolis-Hasting Sampler Purpose--To draw samples from a probability distribution There are three steps 1Propose a move from x to y 2Accept.
Hierarchical Beta Process and the Indian Buffet Process by R. Thibaux and M. I. Jordan Discussion led by Qi An.
The Nested Dirichlet Process Duke University Machine Learning Group Presented by Kai Ni Nov. 10, 2006 Paper by Abel Rodriguez, David B. Dunson, and Alan.
Latent Feature Models for Network Data over Time Jimmy Foulds Advisor: Padhraic Smyth (Thanks also to Arthur Asuncion and Chris Dubois)
Variational Infinite Hidden Conditional Random Fields with Coupled Dirichlet Process Mixtures K. Bousmalis, S. Zafeiriou, L.-P. Morency, M. Pantic, Z.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Bayesian Semi-Parametric Multiple Shrinkage
An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism
Nonparametric Bayesian Learning of Switching Dynamical Processes
Bayesian Generalized Product Partition Model
Cumulative distribution functions and expected values
Accelerated Sampling for the Indian Buffet Process
Multimodal Learning with Deep Boltzmann Machines
Omiros Papaspiliopoulos and Gareth O. Roberts
Nonparametric Latent Feature Models for Link Prediction
A Non-Parametric Bayesian Method for Inferring Hidden Causes
Kernel Stick-Breaking Process
Chinese Restaurant Representation Stick-Breaking Construction
Robust Full Bayesian Learning for Neural Networks
Nonparametric Bayesian Texture Learning and Synthesis
Berlin Chen Department of Computer Science & Information Engineering
Presentation transcript:

Stick-breaking Construction for the Indian Buffet Process Duke University Machine Learning Group Presented by Kai Ni July 27, 2007 Yee Whye The, Dilan Gorur, and Zoubin Ghahramani, AISTATS 2007

Outline Introduction Indian Buffet Process (IBP) Stick-breaking construction for IBP Slice samplers Results

Introduction Indian Buffet Process (IBP)  A distribution over binary matrices consisting of N rows (objects) and an unbounded number of columns (features);  1/0 in entry (i,k) indicates feature k present/absent from object i. An example  Objects are movies – “ Terminator 2 ”, “ Shrek ” and “ Shanghai Knights ” ;  Features are – “ action ”, “ comedy ”, “ stars Jackie Chan ” ;  The matrix can be [101; 010; 110].

Relationship to CRP IBP and CRP are both tools for defining nonparametric Bayesian models with latent variables. CRP – Each object belongs to only one of infinitely many latent classes. IBP – Each object can possess potentially any combination of infinitely many latent features. Previous Gibbs sampler for IBP is based on CRP. In this paper the author derives a stick-breaking representation for the IBP, and develop efficient slice samplers.

Indiant Buffet Process Let Z be a random binary N x K matrix, and denote entry (I,k) in Z by z ik. For each feature k let u k be the prior probability that feature k is present in an object. Let be the strength parameter of the IBP, the full model is: If we integrated out u k and taking the limit of K -> infinity, we obtain the IBP in the situation similar to CRP.

Gibbs sampler for IBP For new features:

Stick-breaking construction for IBP

Derivation pdf for each u cdf for each u cdf for u (1) pdf for u (1)

Relation to DP

Stick-breaking for IBP (2) In truncated stick-breaking for IBP, let K* be the truncation level. We set u (k) =0 for k>K*, and z ik =0 for k>K*.

Slice Sampler Using Adaptive rejection sampling (ARS) to deal with the truncation level. Introduce an auxiliary slice variable s with

1. Update s: if new s makes K* becomes larger, we iteratively draw u (k) until u (K*’) > s. 2. Update Z: given s, we only need to update zik for each i and k<=K*. 3. Update for k = 1, …, K*. 4. Update u (k) for k = 1, …, K*. Sampling 12K*K*’ Decreasing u (k) Old sNew s 0Range of uniform dist. for s

Change of Representations IBP – ignoring the ordering on features; Stick-breaking IBP – enforcing an ordering with decreasing weights. Stick-breaking -> IBP: Drop the stick lengths and the inactive features, leaving only the K + active feature columns along with the corresponding parameters. IBP -> stick-breaking: Draw both the stick lengths and order the features in decreasing stick lengths, introducing K o inactive features until

Semi-ordered Stick-breaking u k + on active features are unordered and draw from a CRP similar distribution: The stick length on inactive feature is similar to the stick- breaking IBP The auxiliary variable s determines how many inactive features need to add. (unordered 1~K + ) KoKo s 0Range of uniform dist. for s Min(u (k) )

Results Used the conjugate linear-Gaussian binary latent feature model for comparing the performance of the different samplers. Each data point is modeled using a spherical Gaussian with mean z i,: A and variance

Demonstration Apply semi-ordered slice sampler to 1000 examples of handwritten images of 3’s in the MNIST dataset.

Conclusion The author derived novel stick-breaking representations of the Indian buffet process. Based on these representations, new MCMC samplers are proposed that are easy to implement and work on more general models than Gibbs sampling.