Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei

Slides:



Advertisements
Similar presentations
Xiaolong Wang and Daniel Khashabi
Advertisements

Markov Chain Sampling Methods for Dirichlet Process Mixture Models R.M. Neal Summarized by Joon Shik Kim (Thu) Computational Models of Intelligence.
MAD-Bayes: MAP-based Asymptotic Derivations from Bayes
Hierarchical Dirichlet Process (HDP)
Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.
Hierarchical Dirichlet Processes
Bayesian dynamic modeling of latent trait distributions Duke University Machine Learning Group Presented by Kai Ni Jan. 25, 2007 Paper by David B. Dunson,
Simultaneous Image Classification and Annotation Chong Wang, David Blei, Li Fei-Fei Computer Science Department Princeton University Published in CVPR.
Fast Bayesian Matching Pursuit Presenter: Changchun Zhang ECE / CMR Tennessee Technological University November 12, 2010 Reading Group (Authors: Philip.
HW 4. Nonparametric Bayesian Models Parametric Model Fixed number of parameters that is independent of the data we’re fitting.
Segmentation and Fitting Using Probabilistic Methods
A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University.
Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.
1 Graphical Diagnostic Tools for Evaluating Latent Class Models: An Application to Depression in the ECA Study Elizabeth S. Garrett Department of Biostatistics.
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
Bayesian Nonparametric Matrix Factorization for Recorded Music Reading Group Presenter: Shujie Hou Cognitive Radio Institute Friday, October 15, 2010 Authors:
Statistical Models for Networks and Text Jimmy Foulds UCI Computer Science PhD Student Advisor: Padhraic Smyth.
Nonparametric Bayes and human cognition Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
Latent Dirichlet Allocation a generative model for text
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Conditional Topic Random Fields Jun Zhu and Eric P. Xing ICML 2010 Presentation and Discussion by Eric Wang January 12, 2011.
Hierarchical Dirichelet Processes Y. W. Tech, M. I. Jordan, M. J. Beal & D. M. Blei NIPS 2004 Presented by Yuting Qi ECE Dept., Duke Univ. 08/26/05 Sharing.
Online Learning for Latent Dirichlet Allocation
Bayesian Sets Zoubin Ghahramani and Kathertine A. Heller NIPS 2005 Presented by Qi An Mar. 17 th, 2006.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
Memory Bounded Inference on Topic Models Paper by R. Gomes, M. Welling, and P. Perona Included in Proceedings of ICML 2008 Presentation by Eric Wang 1/9/2009.
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
Inferring structure from data Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
Simulation of the matrix Bingham-von Mises- Fisher distribution, with applications to multivariate and relational data Discussion led by Chunping Wang.
Style & Topic Language Model Adaptation Using HMM-LDA Bo-June (Paul) Hsu, James Glass.
Eric Xing © Eric CMU, Machine Learning Latent Aspect Models Eric Xing Lecture 14, August 15, 2010 Reading: see class homepage.
The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.
Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream (UAI 2010) Amr Ahmed and Eric.
Summary We propose a framework for jointly modeling networks and text associated with them, such as networks or user review websites. The proposed.
Randomized Algorithms for Bayesian Hierarchical Clustering
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.
1 Dirichlet Process Mixtures A gentle tutorial Graphical Models – Khalid El-Arini Carnegie Mellon University November 6 th, 2006 TexPoint fonts used.
Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter.
Stick-Breaking Constructions
Latent Class Regression Model Graphical Diagnostics Using an MCMC Estimation Procedure Elizabeth S. Garrett Scott L. Zeger Johns Hopkins University
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Statistical Models for Partial Membership Katherine Heller Gatsby Computational Neuroscience Unit, UCL Sinead Williamson and Zoubin Ghahramani University.
Latent Dirichlet Allocation
Stick-breaking Construction for the Indian Buffet Process Duke University Machine Learning Group Presented by Kai Ni July 27, 2007 Yee Whye The, Dilan.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Nonparametric Bayesian Models. HW 4 x x Parametric Model Fixed number of parameters that is independent of the data we’re fitting.
Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:
Hierarchical Beta Process and the Indian Buffet Process by R. Thibaux and M. I. Jordan Discussion led by Qi An.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Bayesian Semi-Parametric Multiple Shrinkage
Online Multiscale Dynamic Topic Models
Bayesian Generalized Product Partition Model
Accelerated Sampling for the Indian Buffet Process
Non-Parametric Models
Omiros Papaspiliopoulos and Gareth O. Roberts
A Non-Parametric Bayesian Method for Inferring Hidden Causes
Collapsed Variational Dirichlet Process Mixture Models
Hierarchical Topic Models and the Nested Chinese Restaurant Process
Generalized Spatial Dirichlet Process Models
a chicken and egg problem…
Michal Rosen-Zvi University of California, Irvine
Latent Dirichlet Allocation
Topic Models in Text Processing
Learning From Observed Data
Presentation transcript:

Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei The IBP Compound Dirichlet Process and its Application to Focused Topic Modeling Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei Presented by Eric Wang 9/16/2011

Introduction Latent Dirichlet Allocation (LDA) is a powerful and ubiquitous topic modeling framework. Incorporating the hierarchical Dirichlet process (HDP) into the LDA allows for more flexible topic modeling by estimating the global topic proportions. A drawback of HDP-LDA is that a topic that is rare globally will also have a low expected proportion within each document. The authors propose a model that allows a rare topic to still have large mass within individual documents.

Hierarchical Dirichlet Process The hierarchical Dirichlet process (HDP) is a prior for Bayesian nonparametric mixed membership modeling of data groups. Hierarchically, it can be defined as where m indexes the data group. In HDP, the expectation of the mixing weights in is . In practice, the mixing weights in is the global average of the mixture membership.

Indian Buffet Process The Indian Buffet Process (IBP) defines a distribution over binary matrices with an infinite number of columns, and a finite number of non-zero entries. Hierarchically, it is defined as where m and k denote the rows and columns of binary matrix b. It can be represented via a stick-breaking construction

IBP Compound Dirichlet Process Combining HDP and IBP into single prior yields an infinite “spike-slab” prior (ICD). A spike distribution (IBP) determines which variables are drawn from the slab (DP). The model assumes the following generative process

IBP Compound Dirichlet Process The atom masses of data group m is Dirichlet distributed as follows where In this construction, the are the topic proportions for document m and B is a binary vector indicating usage of the dictionary elements.

Focused Topic Models The authors use ICD to develop the Focused Topic model (FTM). In this framework, a global distribution over topics is drawn and shared over all documents as in HDP-LDA. Each document infers a subset of topics from the global menu. The subset is determined by the binary vector . Since the binary vector is independent of the global topic proportions, topics that are rare globally can still make up a large proportion of individual documents.

Focused Topic Models The generative process for the FTM is as follows

Posterior Inference To sample the topic indicator for word i in document m, where the integral has an analytical form and . This is an important point because it suggests a general framework that can be adapted to other applications.

Posterior Inference The joint probability of and the total number of words assigned to topic k is and is log differentiable with respect to and . A hybrid MC algorithm is used to sample from their posteriors.

Posterior Inference The topic weights are sampled as And the binary topic indicators are sampled as Notice here that if a topic is used, it is automatically considered “active”, and additional (unused) topics can be activated.

Empirical Results The authors considered three different text datasets: All models were run for 1000 iterations, with the first 500 iterations discarded as burn-in.

Empirical Results Model Perplexity Topic Correlation

Empirical Results Here, the authors compare the number of topics a word appears in (a). The FTM has more concentrated topics. In (b), the authors show the number of documents the topics appear in. The plot illustrates that HDP has many topics that appear in only a few documents, while a significant portion of the FTM topics appear in many documents.

Discussion The authors have proposed a novel model called the IBP compound Dirichlet Process (ICD) that decouples the across-data topic prevalence and the intra-data topic proportions. The Focused Topic Model (FTM) was developed from the ICD that addressed several key shortcomings of HDP-LDA. In HDL-LDA, the global topic prevalence affects the proportion a topic can appear within a document, but in FTM, globally rare topics can still be highly occupied within a document. FTM shows improved perplexity relative to HDP.