Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei
Topic models Source: Topic models, David Blei, MLSS 09.
Xiaolong Wang and Daniel Khashabi
Markov Chain Sampling Methods for Dirichlet Process Mixture Models R.M. Neal Summarized by Joon Shik Kim (Thu) Computational Models of Intelligence.
MAD-Bayes: MAP-based Asymptotic Derivations from Bayes
Hierarchical Dirichlet Process (HDP)
Ouyang Ruofei Topic Model Latent Dirichlet Allocation Ouyang Ruofei May LDA.
Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.
Hierarchical Dirichlet Processes
Bayesian dynamic modeling of latent trait distributions Duke University Machine Learning Group Presented by Kai Ni Jan. 25, 2007 Paper by David B. Dunson,
Simultaneous Image Classification and Annotation Chong Wang, David Blei, Li Fei-Fei Computer Science Department Princeton University Published in CVPR.
Title: The Author-Topic Model for Authors and Documents
Variational Inference for Dirichlet Process Mixture Daniel Klein and Soravit Beer Changpinyo October 11, 2011 Applied Bayesian Nonparametrics Special Topics.
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation James Foulds 1, Levi Boyles 1, Christopher DuBois 2 Padhraic Smyth.
Latent Dirichlet Allocation a generative model for text
British Museum Library, London Picture Courtesy: flickr.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Example 16,000 documents 100 topic Picked those with large p(w|z)
Hierarchical Dirichelet Processes Y. W. Tech, M. I. Jordan, M. J. Beal & D. M. Blei NIPS 2004 Presented by Yuting Qi ECE Dept., Duke Univ. 08/26/05 Sharing.
Online Learning for Latent Dirichlet Allocation
(Infinitely) Deep Learning in Vision Max Welling (UCI) collaborators: Ian Porteous (UCI) Evgeniy Bart UCI/Caltech) Pietro Perona (Caltech)
Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
Memory Bounded Inference on Topic Models Paper by R. Gomes, M. Welling, and P. Perona Included in Proceedings of ICML 2008 Presentation by Eric Wang 1/9/2009.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz.
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
Hidden Topic Markov Models Amit Gruber, Michal Rosen-Zvi and Yair Weiss in AISTATS 2007 Discussion led by Chunping Wang ECE, Duke University March 2, 2009.
Simulation of the matrix Bingham-von Mises- Fisher distribution, with applications to multivariate and relational data Discussion led by Chunping Wang.
Integrating Topics and Syntax -Thomas L
Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream (UAI 2010) Amr Ahmed and Eric.
Summary We propose a framework for jointly modeling networks and text associated with them, such as networks or user review websites. The proposed.
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.
1 Dirichlet Process Mixtures A gentle tutorial Graphical Models – Khalid El-Arini Carnegie Mellon University November 6 th, 2006 TexPoint fonts used.
Stick-Breaking Constructions
A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan.
Latent Dirichlet Allocation
Beam Sampling for the Infinite Hidden Markov Model by Jurgen Van Gael, Yunus Saatic, Yee Whye Teh and Zoubin Ghahramani (ICML 2008) Presented by Lihan.
by Ryan P. Adams, Iain Murray, and David J.C. MacKay (ICML 2009)
Gaussian Processes For Regression, Classification, and Prediction.
Stick-breaking Construction for the Indian Buffet Process Duke University Machine Learning Group Presented by Kai Ni July 27, 2007 Yee Whye The, Dilan.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Nonparametric Bayesian Models. HW 4 x x Parametric Model Fixed number of parameters that is independent of the data we’re fitting.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:
Hierarchical Beta Process and the Indian Buffet Process by R. Thibaux and M. I. Jordan Discussion led by Qi An.
Latent Feature Models for Network Data over Time Jimmy Foulds Advisor: Padhraic Smyth (Thanks also to Arthur Asuncion and Chris Dubois)
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
Bayesian Semi-Parametric Multiple Shrinkage
An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism
Online Multiscale Dynamic Topic Models
Nonparametric Bayesian Learning of Switching Dynamical Processes
Accelerated Sampling for the Indian Buffet Process
Non-Parametric Models
A Non-Parametric Bayesian Method for Inferring Hidden Causes
Kernel Stick-Breaking Process
Collapsed Variational Dirichlet Process Mixture Models
Hierarchical Topic Models and the Nested Chinese Restaurant Process
Stochastic Optimization Maximization for Latent Variable Models
Michal Rosen-Zvi University of California, Irvine
Nonparametric Bayesian Texture Learning and Synthesis
Signal Processing on Graphs: Performance of Graph Structure Estimation
Presentation transcript:

Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang ECE, Duke University March 26, 2010

Outline Motivations LDA and HDP-LDA Sparse Topic Models Inference Using Collapsed Gibbs sampling Experiments Conclusions 1/16

Motivations 2/16 Topics modeling with the “bag of words” assumption An extension of the HDP-LDA model In the LDA and the HDP-LDA models, the topics are drawn from an exchangeable Dirichlet distribution with a scale parameter. As approaches zero, topics will be o sparse: most probability mass on only a few terms o less smooth: empirical counts dominant Goal: to decouple sparsity and smoothness so that these two properties can be achieved at the same time. How: a Bernoulli variable for each term and each topic is introduced.

LDA and HDP-LDA 3/16 LDA HDP-LDA topic : document : word : topic : document : word : Nonparametric form of LDA, with the number of topics unbounded Base measure weights

Sparse Topic Models 4/16 The size of the vocabulary is V Defined on a V-1-simplexDefined on a sub-simplex specified by : a V-length binary vector composed of V Bernoulli variables one selection proportion for each topic Sparsity: the pattern of ones in, controlled by Smoothness: enforced over terms with non-zero ’s through Decoupled!

Sparse Topic Models 5/16

Inference Using Collapsed Gibbs sampling 6/16

Inference Using Collapsed Gibbs sampling 6/16 As in the HDP-LDA  Topic proportions and topic distributions are integrated out.

Inference Using Collapsed Gibbs sampling 6/16  Topic proportions and topic distributions are integrated out.  The direct-assignment method based on the Chinese restaurant franchise (CRF) is used for and an augmented variable, table counts As in the HDP-LDA

Inference Using Collapsed Gibbs sampling 7/16 Notation:  : # of customers (words) in restaurant d (document) eating dish k (topic)  : # of tables in restaurant d serving dish k  : marginal counts represented with dots  K, u: current # of topics and new topic index, respectively  : # of times that term v has been assigned to topic k  : # of times that all the terms have been assigned to topic k  conditional density of under the topic k given all data except

Inference Using Collapsed Gibbs sampling 8/16 Recall the direct-assignment sampling method for the HDP-LDA  Sampling topic assignments if a new topic is sampled, then sample, and let and and  Sampling stick length  Sampling table counts

Inference Using Collapsed Gibbs sampling 8/16 Recall the direct-assignment sampling method for HDP-LDA  Sampling topic assignments for HDP-LDA for sparse TM Instead, the authors integrate out for faster convergence. Since there are total possible, this is the central computational challenge for the sparse TM. straightforward

Inference Using Collapsed Gibbs sampling 9/16 where define vocabulary set of terms that have word assignments in topic k This conditional probability depends on the selector proportions.

Inference Using Collapsed Gibbs sampling 10/16

Inference Using Collapsed Gibbs sampling 10/16

Inference Using Collapsed Gibbs sampling 11/16  Sampling Bernoulli parameter ( using as an auxiliary variable)  Sampling hyper-parameters o : with Gamma(1,1) priors o : Metropolis-Hastings using symmetric Gaussian proposal  Estimate topic distributions from any single sample of z and b define set of terms with an “on” b o sample conditioned on ; o sample conditioned on. sparsity smoothness on the selected terms

Experiments 12/16  arXiv: online research abstracts, D = 2500, V = 2873  Nematode Biology: research abstracts, D = 2500, V = 2944  NIPS: NIPS articles between , V = % of words for each paper are used.  Conf. abstracts: abstracts from CIKM, ICML, KDD, NIPS, SIGIR and WWW, between , V = Four datasets: Two predictive quantities:     where the topic complexity

Experiments 13/16 better perplexity, simpler models larger : smoother less topics similar # of terms

Experiments 14/16

Experiments 15/16 small (<0.01)

Experiments 15/16 small (<0.01) lack of smoothness

Experiments 15/16 small (<0.01) Need more topics to explain all kinds of patterns of empirical word counts lack of smoothness

Experiments 15/16 Infrequent words populate “noise” topics. small (<0.01) Need more topics to explain all kinds of patterns of empirical word counts lack of smoothness

Conclusions 16/16 A new topic model in the HDP-LDA framework, based on the “bag of words” assumption; Main contributions: Decoupling the control of sparsity and smoothness by introducing binary selectors for term assignments in each topic; Developing a collapsed Gibbs sampler in the HDP- LDA framework. Held out performance is better than the HDP-LDA.