Michal Rosen-Zvi University of California, Irvine

Slides:



Advertisements
Similar presentations
Topic models Source: Topic models, David Blei, MLSS 09.
Advertisements

Ouyang Ruofei Topic Model Latent Dirichlet Allocation Ouyang Ruofei May LDA.
Information retrieval – LSI, pLSI and LDA
Hierarchical Dirichlet Processes
Title: The Author-Topic Model for Authors and Documents
An Introduction to LDA Tools Kuan-Yu Chen Institute of Information Science, Academia Sinica.
Probabilistic Clustering-Projection Model for Discrete Data
Probabilistic Generative Models Rong Jin. Probabilistic Generative Model Classify instance x into one of K classes Class prior Density function for class.
Statistical Topic Modeling part 1
Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.
Generative Topic Models for Community Analysis
Caimei Lu et al. (KDD 2010) Presented by Anson Liang.
Statistical Models for Networks and Text Jimmy Foulds UCI Computer Science PhD Student Advisor: Padhraic Smyth.
Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation James Foulds 1, Levi Boyles 1, Christopher DuBois 2 Padhraic Smyth.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
A Probabilistic Model for Classification of Multiple-Record Web Documents June Tang Yiu-Kai Ng.
Latent Dirichlet Allocation a generative model for text
Sample Midterm question. Sue want to build a model to predict movie ratings. She has a matrix of data, where for M movies and U users she has collected.
British Museum Library, London Picture Courtesy: flickr.
Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.
Models for Authors and Text Documents Mark Steyvers UCI In collaboration with: Padhraic Smyth (UCI) Michal Rosen-Zvi (UCI) Thomas Griffiths (Stanford)
LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter Estimation Example Reference.
Scalable Text Mining with Sparse Generative Models
Topic models for corpora and for graphs. Motivation Social graphs seem to have –some aspects of randomness small diameter, giant connected components,..
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Exercise Session 10 – Image Categorization
Introduction to Machine Learning for Information Retrieval Xiaolong Wang.
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Example 16,000 documents 100 topic Picked those with large p(w|z)
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.
(Infinitely) Deep Learning in Vision Max Welling (UCI) collaborators: Ian Porteous (UCI) Evgeniy Bart UCI/Caltech) Pietro Perona (Caltech)
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
Probabilistic Topic Models
Eric Xing © Eric CMU, Machine Learning Latent Aspect Models Eric Xing Lecture 14, August 15, 2010 Reading: see class homepage.
Summary We propose a framework for jointly modeling networks and text associated with them, such as networks or user review websites. The proposed.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 6. Dimensionality Reduction.
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Latent Dirichlet Allocation
1 A Biterm Topic Model for Short Texts Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.
CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.
Text-classification using Latent Dirichlet Allocation - intro graphical model Lei Li
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Understanding unstructured texts via Latent Dirichlet Allocation Raphael Cohen DSaaS, EMC IT June 2015.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Text Classification and Naïve Bayes Formalizing the Naïve Bayes Classifier.
Topic Modeling for Short Texts with Auxiliary Word Embeddings
Online Multiscale Dynamic Topic Models
Shuang-Hong Yang, Hongyuan Zha, Bao-Gang Hu NIPS2009
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Multimodal Learning with Deep Boltzmann Machines
Lecture 15: Text Classification & Naive Bayes
Latent Dirichlet Analysis
Probabilistic Models with Latent Variables
Bayesian Inference for Mixture Language Models
Topic models for corpora and for graphs
Bayesian Inference for Mixture Language Models
Latent Dirichlet Allocation
CS246: Latent Dirichlet Analysis
Junghoo “John” Cho UCLA
Topic models for corpora and for graphs
Topic Models in Text Processing
Information Retrieval
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Unsupervised learning of visual sense models for Polysemous words
Jinwen Guo, Shengliang Xu, Shenghua Bao, and Yong Yu
Presentation transcript:

Michal Rosen-Zvi University of California, Irvine Text Classification Michal Rosen-Zvi University of California, Irvine

Outline The need for dimensionality reduction Classification methods Naïve Bayes The LDA model Topics model and semantic representation The Author Topic Model Model assumptions Inference by Gibbs sampling Results: applying the model to massive datasets Michal Rosen-Zvi, UCI 2004

The need for dimensionality reduction Content-Based Ranking: Ranking matching documents in a search engine according to their relevance to the user Presenting documents as vectors in the words space - ‘bag of words’ representation It is a sparse representation, V>>|D| A need to define conceptual closeness Michal Rosen-Zvi, UCI 2004

Feature Vector representation From: Modeling the Internet and the Web: Probabilistic methods and Algorithms, Pierre Baldi, Paolo Frasconi, Padhraic Smyth Michal Rosen-Zvi, UCI 2004

What is so special about text? No obvious relation between features High dimensionality, (often larger vocabulary, V, than the number of features!) Importance of speed Michal Rosen-Zvi, UCI 2004

Classification: assigning words to topics Different models for data: Discrete Classifier, modeling the boundaries between different classes of the data Prediction of Categorical output e.g., SVM Density Estimator: modeling the distribution of the data points themselves Generative Models e.g. NB Michal Rosen-Zvi, UCI 2004

Document/Term count matrix A Spatial Representation: Latent Semantic Analysis (Landauer & Dumais, 1997) High dimensional space, not as high as |V| SOUL RESEARCH LOVE SCIENCE Document/Term count matrix 1 … 16 SCIENCE 6 19 RESEARCH 2 12 SOUL 3 34 LOVE Doc3 … Doc2 Doc1 SVD EACH WORD IS A SINGLE POINT IN A SEMANTIC SPACE Michal Rosen-Zvi, UCI 2004

Where are we? Naïve Bayes The need for dimensionality reduction Classification methods Naïve Bayes The LDA model Topics model and semantic representation The Author Topic Model Model assumptions Inference by Gibbs sampling Results: applying the model to massive datasets Michal Rosen-Zvi, UCI 2004

The Naïve Bayes classifier Assumes that each of the data points is distributed independently: Results in a trivial learning algorithm Usually does not suffer from overfitting Michal Rosen-Zvi, UCI 2004

Naïve Bayes classifier: words and topics A set of labeled documents is given: {Cd,wd: d=1,…,D} Note: classes are mutually exclusive cd=D . . . Food Dry Bread Milk Eat c1=2 Pet Dog Cat Milk c1=8 Michal Rosen-Zvi, UCI 2004

Simple model for topics Given the topic words are independent The probability for a word, w, given a topic, z, is wz C W Nd D P({w,C}| ) = dP(Cd)ndP(wnd|Cd,) Michal Rosen-Zvi, UCI 2004

Learning model parameters Estimating  from the probability: Here is the probability for word w given topic j and is the number of times the word w is assigned to topic j Under the normalization constraint, one finds Example of making use of the results: predicting the topic of a new document Michal Rosen-Zvi, UCI 2004

Naïve Bayes, multinomial: C W Nd D   P({w,C}) = ∫d  dP(Cd)ndP(wnd|Cd,)P() Generative parameters wj = P(|c=j) Must satisfy wwj = 1, therefore the integration is over the simplex, (space of vectors with non-negative elements that sum up to 1) Might have Dirichlet prior,  Michal Rosen-Zvi, UCI 2004

Inferring model parameters One can find the distribution of  by sampling Making use of the MAP: This is a point estimation of the PDF, provides the mean of the posterior PDF under some conditions provides the full PDF Michal Rosen-Zvi, UCI 2004

Where are we? The LDA model The need for dimensionality reduction Classification methods Naïve Bayes The LDA model Topics model and semantic representation The Author Topic Model Model assumptions Inference by Gibbs sampling Results: applying the model to massive datasets Michal Rosen-Zvi, UCI 2004

LDA: A generative model for topics A model that assigns Dirichlet priors to multinomial distributions: Latent Dirichlet Allocation Assumes that a document is a mixture of topics Michal Rosen-Zvi, UCI 2004

LDA: Inference Fixing the parameters ,  (assuming uniformity) and inferring the distribution of the latent variables: Variational inference (Blei et al) Gibbs sampling (Griffiths & Steyvers) Expectation propagation (Minka) Michal Rosen-Zvi, UCI 2004

Sampling in the LDA model The update rule for fixed , and integrating out  Provides point estimates to  and distributions of the latent variables, z. Michal Rosen-Zvi, UCI 2004

Making use of the topics model in cognitive science… The need for dimensionality reduction Classification methods Naïve Bayes The LDA model Topics model and semantic representation The Author Topic Model Model assumptions Inference by Gibbs sampling Results: applying the model to massive datasets Michal Rosen-Zvi, UCI 2004

The author-topic model Automatically extract topical content of documents Learn association of topics to authors of documents Propose new efficient probabilistic topic model: the author-topic model Some queries that model should be able to answer: What topics does author X work on? Which authors work on topic X? What are interesting temporal patterns in topics? Michal Rosen-Zvi, UCI 2004

The model assumptions Each author is associated with a topics mixture Each document is a mixture of topics With multiple authors, the document will be a mixture of the topics mixtures of the coauthors Each word in a text is generated from one topic and one author (potentially different for each word) Michal Rosen-Zvi, UCI 2004

The generative process Let’s assume authors A1 and A2 collaborate and produce a paper A1 has multinomial topic distribution q1 A2 has multinomial topic distribution q2 For each word in the paper: Sample an author x (uniformly) from A1, A2 Sample a topic z from a qX Sample a word w from a multinomial topic distribution Michal Rosen-Zvi, UCI 2004

Inference in the author topic model Estimate x and z by Gibbs sampling (assignments of each word to an author and topic) Estimation is efficient: linear in data size Infer from each sample using point estimations: Author-Topic distributions (Q) Topic-Word distributions (F) View results at the author-topic model website [off-line] Michal Rosen-Zvi, UCI 2004

Naïve Bayes: author model Observed variables: authors and words on the document Latent variables: concrete authors that generated each word The probability for a word given an author is multinomial with Dirichlet prior Michal Rosen-Zvi, UCI 2004

Results: Perplexity Lower perplexity indicates a better generalization performance Michal Rosen-Zvi, UCI 2004

Results: Perplexity (cont.) Michal Rosen-Zvi, UCI 2004

Perplexity and Ranking results Michal Rosen-Zvi, UCI 2004

Perplexity and Ranking results (cont) Can the model predict the correct author? Michal Rosen-Zvi, UCI 2004