Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

Slides:



Advertisements
Similar presentations
Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei
Advertisements

Topic models Source: Topic models, David Blei, MLSS 09.
Xiaolong Wang and Daniel Khashabi
Information retrieval – LSI, pLSI and LDA
Hierarchical Dirichlet Processes
Simultaneous Image Classification and Annotation Chong Wang, David Blei, Li Fei-Fei Computer Science Department Princeton University Published in CVPR.
Title: The Author-Topic Model for Authors and Documents
An Introduction to LDA Tools Kuan-Yu Chen Institute of Information Science, Academia Sinica.
Statistical Topic Modeling part 1
Bayesian Nonparametric Matrix Factorization for Recorded Music Reading Group Presenter: Shujie Hou Cognitive Radio Institute Friday, October 15, 2010 Authors:
Self Taught Learning : Transfer learning from unlabeled data Presented by: Shankar B S DMML Lab Rajat Raina et al, CS, Stanford ICML 2007.
Generative Topic Models for Community Analysis
Nonparametric Bayes and human cognition Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Latent Dirichlet Allocation a generative model for text
Unsupervised discovery of visual object class hierarchies Josef Sivic (INRIA / ENS), Bryan Russell (MIT), Andrew Zisserman (Oxford), Alyosha Efros (CMU)
British Museum Library, London Picture Courtesy: flickr.
A Bayesian Hierarchical Model for Learning Natural Scene Categories L. Fei-Fei and P. Perona. CVPR 2005 Discovering objects and their location in images.
Multiscale Topic Tomography Ramesh Nallapati, William Cohen, Susan Ditmore, John Lafferty & Kin Ung (Johnson and Johnson Group)
LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter Estimation Example Reference.
Scalable Text Mining with Sparse Generative Models
Object Recognition by Parts Object recognition started with line segments. - Roberts recognized objects from line segments and junctions. - This led to.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Introduction to Machine Learning for Information Retrieval Xiaolong Wang.
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Example 16,000 documents 100 topic Picked those with large p(w|z)
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Online Learning for Latent Dirichlet Allocation
2009 IEEE Symposium on Computational Intelligence in Cyber Security 1 LDA-based Dark Web Analysis.
Memory Bounded Inference on Topic Models Paper by R. Gomes, M. Welling, and P. Perona Included in Proceedings of ICML 2008 Presentation by Eric Wang 1/9/2009.
Texture Segmentation for Remote Sensing Image Based on Texture-Topic Model Hao Feng Zhiguo Jiang Beijing University of Aeronautics & Astronautics Xingmin.
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
Probabilistic Topic Models
27. May Topic Models Nam Khanh Tran L3S Research Center.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Eric Xing © Eric CMU, Machine Learning Latent Aspect Models Eric Xing Lecture 14, August 15, 2010 Reading: see class homepage.
Integrating Topics and Syntax -Thomas L
The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.
An Asymptotic Analysis of Generative, Discriminative, and Pseudolikelihood Estimators by Percy Liang and Michael Jordan (ICML 2008 ) Presented by Lihan.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.
Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter.
Storylines from Streaming Text The Infinite Topic Cluster Model Amr Ahmed, Jake Eisenstein, Qirong Ho Alex Smola, Choon Hui Teo, Eric Xing Carnegie Mellon.
Topic Modeling using Latent Dirichlet Allocation
Latent Dirichlet Allocation
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework N 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard.
Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining Qiaozhu Mei and ChengXiang Zhai Department of Computer Science.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State.
Analysis of Social Media MLD , LTI William Cohen
Presented by David Lee 3/20/2006
Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208
Hierarchical Beta Process and the Indian Buffet Process by R. Thibaux and M. I. Jordan Discussion led by Qi An.
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )
Text-classification using Latent Dirichlet Allocation - intro graphical model Lei Li
A PPLICATIONS OF TOPIC MODELS Daphna Weinshall B Slides credit: Joseph Sivic, Li Fei-Fei, Brian Russel and others.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Topic Modeling for Short Texts with Auxiliary Word Embeddings
Online Multiscale Dynamic Topic Models
Classification of unlabeled data:
Particle Filtering for Geometric Active Contours
Bayesian Inference for Mixture Language Models
Michal Rosen-Zvi University of California, Irvine
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Latent Dirichlet Allocation
Topic Models in Text Processing
Presentation transcript:

Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006

Outline 1.Introduction 2.Exchangeable topic models (L. Fei-Fei. et al. CVPR 2005) 3.Dynamic topic models (D. Blei et al. ICML 2006)

Introduction Topic models – tools for automatically organizing, searching and browsing large collections (documents, images, etc.) Topic models – the discovered patterns often reflect the underlying topics which combined, form corpuses. Exchangeable (static) topic models – the words (patches) of each document (image) are assumed to be independently drawn from a mixture of multinomials; the mixture components (topics) are shared by all documents Dynamic topic models – capture the evolution of topics in a sequentially organized corpus of documents (images)

Exchangeable topic models (CVPR 2005) Used for learning natural scene categories. A key idea is to use intermediate representations (themes) before classifying scenes. Avoid using manually labeled or segmented images to train the system. Local regions are first clustered into different intermediate themes, and then into categories. NO supervision is needed apart from a single category label to the training image. the algorithm provides a principled approach to learning relevant intermediate representations of scenes, without supervision the model is able to group categories of images into a sensible hierarchy

Exchangeable topic models (CVPR 2005)

a patch x is the basic unit of an image an image is a sequence of N patches a category is a collection of I images is the total number of themes intermediate themes (K-dim unit vectors) is the total number of codewords

Exchangeable topic models (CVPR 2005) Bayesian decision: For convenience, is always assumed to be a fixed uniform distribution,

Exchangeable topic models (CVPR 2005) Learning: Variational inference:

Exchangeable topic models (CVPR 2005) Features and codebook: 1.Evenly sampled grid 2.Random sampling 3.Kadir & Brady saliency detector 4.Lowe’s DoG detector

Exchangeable topic models (CVPR 2005) Experimental setup and results: A model for each category was obtained from the training images.

Exchangeable topic models (CVPR 2005) Experimental setup and results:

Exchangeable topic models (CVPR 2005) Experimental setup and results:

Dynamic topic models (ICML 2006) Topic models – tools for automatically organizing, searching and browsing large collections (documents, images, etc.) Topic models – the discovered patterns often reflect the underlying topics which combined, form documents. Exchangeable (static) topic models – the words (patches) of each document (image) are assumed to be independently drawn from a mixture of multinomials; the mixture components (topics) are shared by all documents Dynamic topic models – capture the evolution of topics in a sequentially organized corpus of documents (images)

Dynamic topic models (ICML 2006) Static topic model review: Each document (image) is assumed drawn from the following generative process: 1. choose topic proportions from a distribution over the (K-1) simplex, such as a Dirichlet 2. for each (word) patch: - choose a topic assignment - choose a patch This process assumes that images (documents) are drawn exchangeably from the same set of topics. In a dynamic topic model, we suppose that the data is divided by time slice, for example by year. The images of each slice are modeled with a K-component topic model, where the topics associated with slice t evolve from the topics associated with slice t-1.

Dynamic topic models (ICML 2006) Dynamic topic models: Extension of the logistic normal distribution to time-series simplex data

Dynamic topic models (ICML 2006) Approximate inference: In the dynamic topic model, the latent variables are the topics, mixture proportions and topic indicators. They optimize the free parameters of a distribution over the latent variables so that the distribution is close to K-L divergence to the true posterior. Follow all the derivations in the paper.

Dynamic topic models (ICML 2006) Experimental setup and results: A subset of 30,000 articles from the journal “Science”, 250 from each of the 120 years between 1881 and The corpus is made up of approximately 7.5 million words. To explore the corpus and its themes, a 20-component dynamic topic model was estimated.

Dynamic topic models (ICML 2006)

Discussion: A sequential topic model for discrete data was developed by using Gaussian time series on the natural parameters of the multinomial topics and logistic normal topic proportion models. The most promising extension to the method presented here is to incorporate a model of how new topics in the collection appear or disappear over time, rather than assuming a fixed number of topics.

References: 1.Blei, D., Ng, A., and Jordan, M. (JMLR 2003) – “Latent Dirichlet allocation” 2.Blei, D., Lafferty, J. D. (NIPS 2006) – “Correlated topic models” 3.Fei-Fei, L. and Perona, P. (IEEE CVPR 2005) – “A Bayesian hierarchical model for learning natural scene categories” 4.Blei, D., Lafferty, J. D. (ICML 2006) – “Dynamic topic models”