Download presentation
Presentation is loading. Please wait.
Published byBeverly Carson Modified over 8 years ago
1
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion led by Iulian Pruteanu
2
Outline Introduction Approximate inferences for LDA Collapsed VB inference for LDA Experimental results Conclusions
3
Introduction (1/2) Latent Dirichlet Allocation is suitable for many applications from document modeling to computer vision. Collapsed Gibbs sampling seems to be the preferred choice to the large scale problems; however collapsed Gibbs sampling has its own problems. CVB algorithm, making use of some approximations, is easy to implement and more accurate than standard VB.
4
Introduction (2/2) This paper –proposes an improved VB algorithm based on integrating out the model parameters - assumption: the latent variables are mutually independent –uses a Gaussian approximation for computation efficiency
5
Approximate inferences for LDA(1/3) - observed words - latent variables (topic indices) - mixing proportions - topic parameters - number of documents - number of topics
6
Approximate inferences for LDA (2/3) Given the observed words the task of Bayesian inference is to compute the posterior distribution over 1.Variational Bayes
7
Approximate inferences for LDA (3/3) 2. Collapsed Gibbs sampling
8
Collapsed VB inference for LDA and marginalization on model parameters In variational Bayesian approximation, we assume a factorized form for the posterior approximating distribution. However it is not a good assumption since changes in model parameters ( ) will have a considerable impact on latent variables ( ). CVB is equivalent to marginalizing out the model parameters before approximating the posterior over the latent variable. The exact implementation of CVB has a closed form but is computationally too expensive to be practical. Therefore, the authors propose a simple Gaussian approximation which seems to work very accurately.
9
Experimental results Left: results for KOS. D=3,430 documents; W=6,909; N=467,714 words Right: results for NIPS. D=1,675 documents; W=12,419; N=2,166,029 words 10% for testing; 50 random runs Variational bounds (# iterations) Log probabilities (# iterations)
10
Conclusions Variational approximation are much more efficient computationally than Gibbs sampling, with almost no loss in accuracy The CVB inference algorithm is easy to implement*, computationally efficient (Gaussian approximation) and more accurate than standard VB.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.