Hierarchical Beta Process and the Indian Buffet Process by R. Thibaux and M. I. Jordan Discussion led by Qi An.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei
Xiaolong Wang and Daniel Khashabi
MAD-Bayes: MAP-based Asymptotic Derivations from Bayes
Hierarchical Dirichlet Process (HDP)
A Tutorial on Learning with Bayesian Networks
Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.
Hierarchical Dirichlet Processes
Bayesian dynamic modeling of latent trait distributions Duke University Machine Learning Group Presented by Kai Ni Jan. 25, 2007 Paper by David B. Dunson,
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
HW 4. Nonparametric Bayesian Models Parametric Model Fixed number of parameters that is independent of the data we’re fitting.
Dictionary Learning on a Manifold
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
Assuming normally distributed data! Naïve Bayes Classifier.
Nonparametric Bayes and human cognition Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Nonparametric Bayesian Learning
The moment generating function of random variable X is given by Moment generating function.
Review of Lecture Two Linear Regression Normal Equation
Hierarchical Bayesian Nonparametrics with Applications Michael I. Jordan University of California, Berkeley Acknowledgments: Emily Fox, Erik Sudderth,
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Fast Max–Margin Matrix Factorization with Data Augmentation Minjie Xu, Jun Zhu & Bo Zhang Tsinghua University.
Hierarchical Dirichelet Processes Y. W. Tech, M. I. Jordan, M. J. Beal & D. M. Blei NIPS 2004 Presented by Yuting Qi ECE Dept., Duke Univ. 08/26/05 Sharing.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Bayesian Sets Zoubin Ghahramani and Kathertine A. Heller NIPS 2005 Presented by Qi An Mar. 17 th, 2006.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
NAÏVE CREDAL CLASSIFIER 2 : AN EXTENSION OF NAÏVE BAYES FOR DELIVERING ROBUST CLASSIFICATIONS 이아람.
Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz.
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
Inferring structure from data Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture model Based on Minimum Message Length by Nizar Bouguila.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Randomized Algorithms for Bayesian Hierarchical Clustering
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
Variational Inference for the Indian Buffet Process
Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.
1 Dirichlet Process Mixtures A gentle tutorial Graphical Models – Khalid El-Arini Carnegie Mellon University November 6 th, 2006 TexPoint fonts used.
Stick-Breaking Constructions
CS Statistical Machine learning Lecture 24
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
The Infinite Hierarchical Factor Regression Model Piyush Rai and Hal Daume III NIPS 2008 Presented by Bo Chen March 26, 2009.
Statistical Models for Partial Membership Katherine Heller Gatsby Computational Neuroscience Unit, UCL Sinead Williamson and Zoubin Ghahramani University.
Latent Dirichlet Allocation
Characterizing the Function Space for Bayesian Kernel Models Natesh S. Pillai, Qiang Wu, Feng Liang Sayan Mukherjee and Robert L. Wolpert JMLR 2007 Presented.
Hierarchical Model-Based Clustering of Large Datasets Through Fractionation and Refractionation Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Jeremy.
by Ryan P. Adams, Iain Murray, and David J.C. MacKay (ICML 2009)
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Nonparametric Bayesian Models. HW 4 x x Parametric Model Fixed number of parameters that is independent of the data we’re fitting.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:
The Nested Dirichlet Process Duke University Machine Learning Group Presented by Kai Ni Nov. 10, 2006 Paper by Abel Rodriguez, David B. Dunson, and Alan.
APPLICATIONS OF DIRICHLET PROCESS MIXTURES TO SPEAKER ADAPTATION Amir Harati and Joseph PiconeMarc Sobel Institute for Signal and Information Processing,
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )
Text-classification using Latent Dirichlet Allocation - intro graphical model Lei Li
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism
Non-Parametric Models
Distributions and Concepts in Probability Theory
A Non-Parametric Bayesian Method for Inferring Hidden Causes
Kernel Stick-Breaking Process
Hierarchical Topic Models and the Nested Chinese Restaurant Process
Michal Rosen-Zvi University of California, Irvine
LECTURE 07: BAYESIAN ESTIMATION
Topic Models in Text Processing
Presentation transcript:

Hierarchical Beta Process and the Indian Buffet Process by R. Thibaux and M. I. Jordan Discussion led by Qi An

Outline Introduction Indian buffet process (IBP) Beta process (BP) Connections between IBP and BP Hierarchical beta process (hBP) Application to document classification Conclusions

Introduction Mixture models –Each data is drawn from one mixture component –Number of mixture components is not set a prior –Distribution over partitions Factorial models –Each data is associated with a set of latent Bernoulli variables –Cardinality of the set of features can vary –A “featural” description of objects –A natural way to define interesting topologies on cluster –May be appropriate for large number of clusters VS.

Beta process Beta process is a special case of independent increment process, or Levy process, If we draw a set of points from a Poisson process with base measure v, then When the base measure B 0 is discrete:, then B has atoms at the same locations with As the representation shows, B is discrete with probability one. Levy process can be characterized by Levy measure. For beta process, it is

Bernoulli process Here, Ω can be viewed as a set of potential features and the random measure B defines the probability that X can possess particular feature. In Indian buffet process, X is the customer and its features are the dishes the customer taste.

Connections between IBP and BP It is proven that the observations from a beta process satisfy Procedure: The first customer will try Poi(γ) number of dishes (feature). After that, the new observation can taste previous dish j with probability and then try a number of new features As a result, beta process is a two-parameter (c, γ) generalization of the Indian buffet process. IBP=BP(c=1, γ=α) where is the total mass

The total number of unique dishes can be roughly represented as This quantity becomes Poi(γ) if c  0 (all customers share the same dishes) or Poi(n γ) if c  ∞ (no sharing).

An algorithm to generate beta process Authors propose to generate an approximation,, of B Let For each step n≥1

Hierarchical beta process Consider a document classification problem. We have a training data set X, which is a list of documents. Each document is classified by one of n topics. We model a document by the set of words it contains. We assume document X i,j is generated by including each word w independently with a probability p j w specific to topic j. These probabilities form a discrete measure A j over all word space Ω. We can put a beta process BP(c j,B) prior on A j. Since we want the sharing across different topics, B has to be discrete. We thus put a beta process prior BP(c 0,B 0 ) on B, which allows sharing the same atoms among topics. The HBP model can be summarized as: This model can be solved with Monte Carlo inference algorithm.

Applications Authors applied the hierarchical beta process to a document classification problem Compare it to the Naïve Bayes (with Laplace smoothing) results The hBP model can obtain 58% result while the best Naïve Bayes result is 50%

Conclusions The beta process is shown to be suitable for nonparametric Bayesian factorial modeling The beta process can be extended to a recursively-defined hierarchy of beta process Compared to the Dirichlet process, the beta process has the potential advantage of being an independent increments process More work on inference algorithm is necessary to fully exploit beta process models.