14.0 Linguistic Processing and Latent Topic Analysis.

Slides:



Advertisements
Similar presentations
Google News Personalization: Scalable Online Collaborative Filtering
Advertisements

Information retrieval – LSI, pLSI and LDA
Community Detection with Edge Content in Social Media Networks Paper presented by Konstantinos Giannakopoulos.
1 Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling Jerome R. Bellegarda.
Title: The Author-Topic Model for Authors and Documents
Dimensionality Reduction PCA -- SVD
An Introduction to LDA Tools Kuan-Yu Chen Institute of Information Science, Academia Sinica.
CS 599: Social Media Analysis University of Southern California1 Elementary Text Analysis & Topic Modeling Kristina Lerman University of Southern California.
Statistical Topic Modeling part 1
What is missing? Reasons that ideal effectiveness hard to achieve: 1. Users’ inability to describe queries precisely. 2. Document representation loses.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 4 March 30, 2005
Latent Dirichlet Allocation a generative model for text
TFIDF-space  An obvious way to combine TF-IDF: the coordinate of document in axis is given by  General form of consists of three parts: Local weight.
Lecture 21 SVD and Latent Semantic Indexing and Dimensional Reduction
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
Information Retrieval: Models and Methods October 15, 2003 CMSC Gina-Anne Levow.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006
Probabilistic Latent Semantic Analysis
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
DATA MINING LECTURE 7 Dimensionality Reduction PCA – SVD
Topic models for corpora and for graphs. Motivation Social graphs seem to have –some aspects of randomness small diameter, giant connected components,..
Other IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models boolean vector.
CS 277: Data Mining Recommender Systems
1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.
Latent Semantic Indexing Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu.
Introduction to Machine Learning for Information Retrieval Xiaolong Wang.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
2010 © University of Michigan Latent Semantic Indexing SI650: Information Retrieval Winter 2010 School of Information University of Michigan 1.
NL Question-Answering using Naïve Bayes and LSA By Kaushik Krishnasamy.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
Combining Statistical Language Models via the Latent Maximum Entropy Principle Shaojum Wang, Dale Schuurmans, Fuchum Peng, Yunxin Zhao.
27. May Topic Models Nam Khanh Tran L3S Research Center.
Pseudo-supervised Clustering for Text Documents Marco Maggini, Leonardo Rigutini, Marco Turchi Dipartimento di Ingegneria dell’Informazione Università.
Generic text summarization using relevance measure and latent semantic analysis Gong Yihong and Xin Liu SIGIR, April 2015 Yubin Lim.
Style & Topic Language Model Adaptation Using HMM-LDA Bo-June (Paul) Hsu, James Glass.
Latent Semantic Indexing: A probabilistic Analysis Christos Papadimitriou Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala.
Text Categorization Moshe Koppel Lecture 12:Latent Semantic Indexing Adapted from slides by Prabhaker Raghavan, Chris Manning and TK Prasad.
Integrating Topics and Syntax -Thomas L
SINGULAR VALUE DECOMPOSITION (SVD)
Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.
Topic Modeling using Latent Dirichlet Allocation
Modern information retreival Chapter. 02: Modeling (Latent Semantic Indexing)
Latent Dirichlet Allocation
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Techniques for Dimensionality Reduction
CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
Yue Xu Shu Zhang.  A person has already rated some movies, which movies he/she may be interested, too?  If we have huge data of user and movies, this.
Link Distribution on Wikipedia [0407]KwangHee Park.
10.0 Latent Semantic Analysis for Linguistic Processing References : 1. “Exploiting Latent Semantic Information in Statistical Language Modeling”, Proceedings.
Recuperação de Informação B Cap. 02: Modeling (Latent Semantic Indexing & Neural Network Model) 2.7.2, September 27, 1999.
Natural Language Processing Topics in Information Retrieval August, 2002.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
The topic discovery models
Information Retrieval: Models and Methods
Multimodal Learning with Deep Boltzmann Machines
Latent Semantic Indexing
Efficient Estimation of Word Representation in Vector Space
Michal Rosen-Zvi University of California, Irvine
Junghoo “John” Cho UCLA
Topic Models in Text Processing
14.0 Linguistic Processing and Latent Topic Analysis
Restructuring Sparse High Dimensional Data for Effective Retrieval
Latent semantic space: Iterative scaling improves precision of inter-document similarity measurement Rie Kubota Ando. Latent semantic space: Iterative.
Introduction to Digital Speech Processing
Presentation transcript:

14.0 Linguistic Processing and Latent Topic Analysis

Documents Words Topic TkTk Latent Semantic Analysis (LSA)

Latent Semantic Analysis (LSA) - Word-Document Matrix Representation Vocabulary V of size M and Corpus T of size N – V={w 1,w 2,...w i,..w M }, w i : the i-th word,e.g. M=2 × 10 4 T={d 1,d 2,...d j,..d N }, d j : the j-th document,e.g. N=10 5 – c ij : number of times w i occurs in d j n j : total number of words present in d j t i = Σ j c ij : total number of times w i occurs in T – Word-Document Matrix W W = [w ij ] – each row of W is a N-dim “feature vector” for a word w i with respect to all documents d j each column of W is a M-dim “feature vector” for a document d j with respect to all words w i d 1 d d j d N w1w2wiwMw1w2wiwM w ij

Latent Semantic Analysis (LSA) j

Dimensionality Reduction (1/2) – – dimensionality reduction: selection of R largest eigenvalues (R=800 for example) T 2 R “concepts” or “latent semantic concepts”

Dimensionality Reduction (2/2) – – dimensionality reduction: selection of R largest eigenvalues 2 T T s i 2 : weights (significance of the “component matrices” e’ i e’ i T ) R “concepts” or “latent semantic concepts” (i, j) element of W T W : inner product of i-th and j-th columns of W “similarity” between d i and d j T N N

Singular Value Decomposition (SVD) – s i : singular values, s 1 ≥ s ≥ s R U: left singular matrix, V: right singular matrix Vectors for word w i : u i S=u i (a row) – a vector with dimentionality N reduced to a vector u i S=u i with dimentionality R – N-dimentional space defined by N documents reduced to R-dimentional space defined by R “concepts” – the R row vectors of V T, or column vectors of V, or eigenvectors {e’ 1,..e’ R }, are the R orthonormal basis for the “latent semantic space” with dimentionality R, with which u i S = u i is represented – words with similar “semantic concepts” have “closer” location in the “latent semantic space” they tend to appear in similar “types” of documents, although not necessarily in exactly the same documents d 1 d d j d N w1w2wiwMw1w2wiwM w ij = w1w2wiwMw1w2wiwM uiui U R×R s1s1 sRsR d 1 d d j d N VTVT R×N M×R vjvj T

Singular Value Decomposition (SVD) d p =USv p T (just as a column in W= USV T )

Singular Value Decomposition (SVD) Vectors for document d j : v j S=v j (a row, or v j = S v j T for a column) – a vector with dimentionality M reduced to a vector v j S=v j with dimentionality R – M-dimentional space defined by M words reduced to R-dimentional space defined by R “concepts” – the R columns of U, or eigenvectors{e 1,...e R }, are the R orthonormal basis for the “latent semantic space” with dimensionality R, with which v j S=v j is represented – documents with similar “semantic concepts” have “closer” location in the “latent semantic space” they tend to include similar “types” of words, although not necessarily exactly the same words The Association Structure between words w i and documents d j is preserved with noisy information deleted, while the dimensionality is reduced to a common set of R “concepts” d 1 d d j d N w1w2wiwMw1w2wiwM w ij = w1w2wiwMw1w2wiwM uiui U R×R s1s1 sRsR d 1 d d j d N vjvj VTVT R×N M×R T T

Example Applications in Linguistic Processing Word Clustering – example applications: class-based language modeling, information retrieval,etc. – words with similar “semantic concepts” have “closer” location in the “latent semantic space” they tend to appear in similar “types” of documents, although not necessarily in exactly the same documents – each component in the reduced word vector u j S=u j is the “association” of the word with the corresponding “concept” – example similarity measure between two words: Document Clustering – example applications: clustered language modeling, language model adaptation, information retrieval, etc. – documents with similar “semantic concepts” have “closer” location in the “latent semantic space” they tend to include similar “types” of words, although not necessarily exactly the same words – each component on the reduced document vector v j S=v j is the “association” of the document with the corresponding “concept” – example “similarity” measure between two documents: 2 2

LSA for Linguistic Processing

Example Applications in Linguistic Processing Information Retrieval – “concept matching” vs “lexical matching” : relevant documents are associated with similar “concepts”, but may not include exactly the same words – example approach: treating the query as a new document (by “folding-in”), and evaluating its “similarity” with all possible documents Fold-in – consider a new document outside of the training corpus T, but with similar language patterns or “concepts” – construct a new column d p,p>N, with respect to the M words – assuming U and S remain unchanged d p =USv p T (just as a column in W= USV T ) v p = v p S = d p T U as an R-dim representation of the new document (i.e. obtaining the projection of d p on the basis e i of U by inner product)

Integration with N-gram Language Models Language Modeling for Speech Recognition – Prob(w q |d q-1 ) w q : the q-th word in the current document to be recognized (q: sequence index) d q-1 : the recognized history in the current document v q-1 =d q-1 T U : representation of d q-1 by v q-1 (folded-in) – Prob(w q |d q-1 ) can be estimated by u q and v q-1 in the R-dim space – integration with N-gram Prob(w q |H q-1 ) =Prob(w q |h q-1, d q-1 ) H q-1 : history up to w q-1 h q-1 : – N-gram gives local relationships, while d q-1 gives semantic concepts – d q-1 emphasizes more the key content words, while N-gram counts all words similarly including function words v q-1 for d q-1 can be estimated iteratively – assuming the q-th word in the current document is w i (n) v q moves in the R-dim space initially, eventually settle down somewhere i-th dimentionality out of M T

Probabilistic Latent Semantic Analysis (PLSA) Exactly the same as LSA, using a set of latent topics{ }to construct a new relationship between the documents and terms, but with a probabilistic framework Trained with EM by maximizing the total likelihood : frequency count of term in the document D i : documents T k : latent topics t j : terms

Probabilistic Latent Semantic Analysis (PLSA) w: word z: topic d: document N: words in document d M: documents in corpus

Latent Dirichlet Allocation(LDA) ( k: topic index, a total of K topics ) A document is represented as random mixtures of latent topics Each topic is characterized by a distribution over words

Gibbs Sampling in general

Gibbs Sampling applied on LDA … w 11 ? w 12 ? w 13 ? w 1n ? … Doc 1 w 21 ? w 22 ? w 23 ? w 2n ? … Doc 2 Sample P(Z,W) : Topic Word

Gibbs Sampling applied on LDA … w 11 w 12 w 13 w 1n … Doc 1 w 21 w 22 w 23 w 2n … Doc 2 Sample P(Z,W) : 1.Random Initialization

Gibbs Sampling applied on LDA … w 11 ? w 12 w 13 w 1n … Doc 1 w 21 w 22 w 23 w 2n … Doc 2 Sample P(Z,W) : 1.Random Initialization 2.Erase Z 11, and draw a new Z 11 ~

Gibbs Sampling applied on LDA … w 11 w 12 ? w 13 w 1n … Doc 1 w 21 w 22 w 23 w 2n … Doc 2 Sample P(Z,W) : 1.Random Initialization 2.Erase Z 11, and draw a new Z 11 ~ 3.Erase Z 12, and draw a new Z 12 ~

Gibbs Sampling applied on LDA w 11 w 12 w 13 w 1n w 21 w 22 w 23 w 2n … … … Doc 1 Doc 2 Sample P(Z,W) : 1.Random Initialization 2.Erase Z 11, and draw a new Z 11 ~ 3.Erase Z 12, and draw a new Z 12 ~ 4.Iteratively update topic assignment for each word until converge 5.Compute θ, φ according to the final setting

Matrix Factorization (MF) for Recommendation systems Movie 1 Movie 2 Movie 3 Movie 4 Movie 5 Movie 6 Movie 7 Movie 8 Movie 9 User A User B User C4.1 User D User E3.3 User F2.9 User G2.62.7

Matrix Factorization (MF) Mapping both users and items to a joint latent factor space of dimensionality f i I 1 1 u U = 1 u U i I 1 f f latent factor: towards male, seriousness, etc.

Matrix Factorization (MF)

Overfitting Problem A good model is not just to fit all the training data – needs to cover unseen data well which may have distributions slightly different from that of training data – too complicated models with too many parameters usually leads to overfitting

Extensions of Matrix Factorization (MF) Biased MF – add global bias μ (usually = average rating), user bias b u, and item bias b i as parameters Non-negative Matrix Factorization – restrict the value in each component of p u and q i to be non-negative

References LSA and PLSA – “Exploiting Latent Semantic Information in Statistical Language Modeling”, Proceedings of the IEEE, Aug 2000 – “Latent Semantic Mapping”, IEEE Signal Processing Magazine, Sept. 2005, Special Issue on Speech Technology in Human-Machine Communication – “Probabilistic Latent Semantic Indexing”, ACM Special Interest Group on Information Retrieval (ACM SIGIR), 1999 – “Probabilistic Latent Semantic Indexing”, Proc. of Uncertainty in Artificial Intelligence, 1999 – “Spoken Document Understanding and Organization”, IEEE Signal Processing Magazine, Sept. 2005, Special Issue on Speech Technology in Human-Machine Communication LDA and Gibbs Sampling – Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer 2006 – Blei, David M.; Andrew Y. Ng, Michael I. Jordan. "Latent Dirichlet Allocation”, Journal of Machine Learning Research 2003 – Gregor Heinrich, ”Parameter estimation for text analysis”, 2005

References Matrix Factorization – A Linear Ensemble of Individual and Blended Models for Music Rating Prediction. In JMLR W&CP, volume 18, – Y. Koren, R. M. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. Computer, 42(8):30-37, – Introduction to Matrix Factorization Methods Collaborative Filtering ( ltering.Factorization.pdf) ltering.Factorization.pdf – GraphLab API: Collaborative Filtering ( – J Mairal, F Bach, J Ponce, G Sapiro, Online learning for matrix factorization and sparse coding, The Journal of Machine Learning, 2010