Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.

Slides:



Advertisements
Similar presentations
Introduction to Monte Carlo Markov chain (MCMC) methods
Advertisements

Topic models Source: Topic models, David Blei, MLSS 09.
Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006.
Efficient Cosmological Parameter Estimation with Hamiltonian Monte Carlo Amir Hajian Amir Hajian Cosmo06 – September 25, 2006 Astro-ph/
Title: The Author-Topic Model for Authors and Documents
Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.
Information Bottleneck EM School of Engineering & Computer Science The Hebrew University, Jerusalem, Israel Gal Elidan and Nir Friedman.
Probabilistic Clustering-Projection Model for Discrete Data
Variational Inference for Dirichlet Process Mixture Daniel Klein and Soravit Beer Changpinyo October 11, 2011 Applied Bayesian Nonparametrics Special Topics.
Suggested readings Historical notes Markov chains MCMC details
Stochastic approximate inference Kay H. Brodersen Computational Neuroeconomics Group Department of Economics University of Zurich Machine Learning and.
Generative Topic Models for Community Analysis
Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation James Foulds 1, Levi Boyles 1, Christopher DuBois 2 Padhraic Smyth.
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
Particle filters (continued…). Recall Particle filters –Track state sequence x i given the measurements ( y 0, y 1, …., y i ) –Non-linear dynamics –Non-linear.
MSRC Summer School - 30/06/2009 Cambridge – UK Hybrids of generative and discriminative methods for machine learning.
British Museum Library, London Picture Courtesy: flickr.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Topic models for corpora and for graphs. Motivation Social graphs seem to have –some aspects of randomness small diameter, giant connected components,..
Study of Sparse Online Gaussian Process for Regression EE645 Final Project May 2005 Eric Saint Georges.
Radial Basis Function Networks
Introduction to Monte Carlo Methods D.J.C. Mackay.
Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine.
Particle Filtering in Network Tomography
6. Experimental Analysis Visible Boltzmann machine with higher-order potentials: Conditional random field (CRF): Exponential random graph model (ERGM):
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Bayesian parameter estimation in cosmology with Population Monte Carlo By Darell Moodley (UKZN) Supervisor: Prof. K Moodley (UKZN) SKA Postgraduate conference,
Using Fast Weights to Improve Persistent Contrastive Divergence Tijmen Tieleman Geoffrey Hinton Department of Computer Science, University of Toronto ICML.
Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.
Online Learning for Latent Dirichlet Allocation
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.
Monte Carlo Methods1 T Special Course In Information Science II Tomas Ukkonen
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Integrating Topics and Syntax -Thomas L
Continuous Variables Write message update equation as an expectation: Proposal distribution W t (x t ) for each node Samples define a random discretization.
Summary We propose a framework for jointly modeling networks and text associated with them, such as networks or user review websites. The proposed.
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
MCMC in structure space MCMC in order space.
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.
Weight Uncertainty in Neural Networks
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
Analysis of Social Media MLD , LTI William Cohen
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Latent Feature Models for Network Data over Time Jimmy Foulds Advisor: Padhraic Smyth (Thanks also to Arthur Asuncion and Chris Dubois)
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
RECITATION 2 APRIL 28 Spline and Kernel method Gaussian Processes Mixture Modeling for Density Estimation.
Inferring User Interest Familiarity and Topic Similarity with Social Neighbors in Facebook INSTRUCTOR: DONGCHUL KIM ANUSHA BOOTHPUR
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Online Multiscale Dynamic Topic Models
J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009
CSCI 5822 Probabilistic Models of Human and Machine Learning
Markov Networks.
Probabilistic Models with Latent Variables
Bayesian Inference for Mixture Language Models
Stochastic Optimization Maximization for Latent Variable Models
Topic models for corpora and for graphs
Robust Full Bayesian Learning for Neural Networks
Topic models for corpora and for graphs
Topic Models in Text Processing
Presentation transcript:

Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James Foulds has recently moved to the University of California, Santa Cruz

Motivation Topic model extensions – Structure, prior knowledge and constraints Sparse, nonparametric, correlated, tree-structured, time series, supervised, focused, determinantal… – Special-purpose models Authorship, scientific impact, political affiliation, conversational influence, networks, machine translation… – General-purpose models Dirichlet multinomial regression (DMR), sparse additive generative (SAGE)… Structural topic model (STM) 2

Motivation Topic model extensions – Structure, prior knowledge and constraints Sparse, nonparametric, correlated, tree-structured, time series, supervised, focused, determinantal… – Special-purpose models Authorship, scientific impact, political affiliation, conversational influence, networks, machine translation… – General-purpose models Dirichlet multinomial regression (DMR), sparse additive generative (SAGE)… Structural topic model (STM) 3

Motivation Topic model extensions – Structure, prior knowledge and constraints Sparse, nonparametric, correlated, tree-structured, time series, supervised, focused, determinantal… – Special-purpose models Authorship, scientific impact, political affiliation, conversational influence, networks, machine translation… – General-purpose models Dirichlet multinomial regression (DMR), sparse additive generative (SAGE), Structural topic model (STM), … 4

Motivation Inference algorithms for topic models – Optimization EM, variational inference, collapsed variational inference,… – Sampling Collapsed Gibbs sampling, Langevin dynamics,… – Scaling to ``big data’’ Stochastic algorithms, distributed algorithms, map reduce… 5

Motivation Inference algorithms for topic models – Optimization EM, variational inference, collapsed variational inference,… – Sampling Collapsed Gibbs sampling, Langevin dynamics,… – Scaling to ``big data’’ Stochastic algorithms, distributed algorithms, map reduce… 6

Motivation Inference algorithms for topic models – Optimization EM, variational inference, collapsed variational inference,… – Sampling Collapsed Gibbs sampling, Langevin dynamics,… – Scaling up to ``big data’’ Stochastic algorithms, distributed algorithms, map reduce, sparse data structures… 7

Motivation Which existing techniques should we use? Is my new model/algorithm better than previous methods? 8

Evaluating Topic Models Training set Test set 9

Evaluating Topic Models Training set Test set Topic model 10

Evaluating Topic Models Training set Test set Topic model Predict: 11

Evaluating Topic Models Training set Test set Topic model Predict: Log Pr ( ) 12

Evaluating Topic Models Fitting these models only took a few hours on a single core single core machine. Creating this plot required a cluster 13 (Foulds et al., 2013)

Why is this Difficult? For every held-out document d, we need to estimate We need to approximate possibly tens of thousands of intractable sums/integrals! 14

Annealed Importance Sampling (Neal, 2001) Scales up importance sampling to high dimensional data, using MCMC Corrects for MCMC convergence failures using importance weights 15

Annealed Importance Sampling (Neal, 2001) 16 low “temperature”

Annealed Importance Sampling (Neal, 2001) 17 low “temperature”high “temperature”

Annealed Importance Sampling (Neal, 2001) 18 low “temperature”high “temperature”

Annealed Importance Sampling (Neal, 2001) 19

Annealed Importance Sampling (Neal, 2001) Importance samples from the target An estimate of the ratio of partition functions 20

AIS for Evaluating Topic Models (Wallach et al., 2009) 21 Draw from the priorAnneal towardsThe posterior

AIS for Evaluating Topic Models (Wallach et al., 2009) 22 Draw from the priorAnneal towardsThe posterior

AIS for Evaluating Topic Models (Wallach et al., 2009) 23 Draw from the priorAnneal towardsThe posterior

AIS for Evaluating Topic Models (Wallach et al., 2009) 24 Draw from the priorAnneal towardsThe posterior

Insights 1.We are mainly interested in the relative performance of topic models 2.AIS can provide estimates of the ratio of partition functions of any two distributions that we can anneal between 25

low “temperature”high “temperature” A standard application of Annealed Importance Sampling (Neal, 2001) 26

The Proposed Method: Ratio-AIS Draw from Topic Model 2 Anneal towards Topic Model 1 27 medium “temperature”

The Proposed Method: Ratio-AIS 28 medium “temperature” Draw from Topic Model 2 Anneal towards Topic Model 1

The Proposed Method: Ratio-AIS 29 medium “temperature” Draw from Topic Model 2 Anneal towards Topic Model 1

Advantages of Ratio-AIS Ratio-AIS avoids several sources of Monte Carlo error for comparing two models. The standard method – estimates the denominator of a ratio even though it is a constant (=1), – uses different z’s for both models, – and is run twice, introducing Monte Carlo noise each time. An easy convergence check: anneal in the reverse direction to compute the reciprocal. 30

Annealing Paths Between Topic Models Geometric average of the two distributions Convex combination of the parameters 31

Efficiently Plotting Performance Per Iteration of the Learning Algorithm 32 (Foulds et al., 2013)

Insights 1.Fsf 2.2sfd 3.We can select the AIS intermediate distributions to be distributions of interest 4.The sequence of models we reach during training is typically amenable to annealing – The early models are often low temperature – Each successive model is similar to the previous one 33

Iteration-AIS 34 Re-uses all previous computation Warm starts More annealing temperatures, for free Importance weights can be computed recursively Anneal from Prior Iteration 1 Iteration 2… Iteration N Wallach et al. Ratio AIS Topic Model at

Iteration-AIS 35 Re-uses all previous computation Warm starts More annealing temperatures, for free Importance weights can be computed recursively Anneal from Prior Iteration 1 Iteration 2… Iteration N Wallach et al. Ratio AIS Topic Model at

Comparing Very Similar Topic Models (ACL Corpus) 36

Comparing Very Similar Topic Models (ACL and NIPS) 37 % Accuracy

Symmetric vs Asymmetric Priors (NIPS, 1000 temperatures or equiv.) 38 Correlation with longer left-to-right run Variance of the estimate of relative log-likelihood

Symmetric vs Asymmetric Priors (NIPS, 1000 temperatures or equiv.) 39 Correlation with longer left-to-right run Variance of the estimate of relative log-likelihood

Symmetric vs Asymmetric Priors (NIPS, 1000 temperatures or equiv.) 40 Correlation with longer left-to-right run Variance of the estimate of relative log-likelihood

Per-Iteration Evaluation, ACL Dataset 41

Per-Iteration Evaluation, ACL Dataset 42

Conclusions Use Ratio-AIS for detailed document-level analysis Run the annealing in both directions to check for convergence failures Use Left to Right for corpus-level analysis Use Iteration-AIS to evaluate training algorithms 43

Future Directions The ratio-AIS and iteration-AIS ideas can potentially be applied to other models with intractable likelihoods or partition functions (e.g. RBMs, ERGMs) Other annealing paths may be possible Evaluating topic models remains an important, computationally challenging problem 44

Thank You! Questions? 45