10.0 Latent Semantic Analysis for Linguistic Processing References : 1. “Exploiting Latent Semantic Information in Statistical Language Modeling”, Proceedings.

Slides:



Advertisements
Similar presentations
Latent Semantic Analysis
Advertisements

Information retrieval – LSI, pLSI and LDA
1 Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling Jerome R. Bellegarda.
Dimensionality Reduction PCA -- SVD
Probabilistic inference in human semantic memory Mark Steyvers, Tomas L. Griffiths, and Simon Dennis 소프트컴퓨팅연구실오근현 TRENDS in Cognitive Sciences vol. 10,
Comparison of information retrieval techniques: Latent semantic indexing (LSI) and Concept indexing (CI) Jasminka Dobša Faculty of organization and informatics,
Lecture 19 Singular Value Decomposition
What is missing? Reasons that ideal effectiveness hard to achieve: 1. Users’ inability to describe queries precisely. 2. Document representation loses.
Hinrich Schütze and Christina Lioma
Expectation Maximization Method Effective Image Retrieval Based on Hidden Concept Discovery in Image Database By Sanket Korgaonkar Masters Computer Science.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 4 March 30, 2005
Information Retrieval in Text Part III Reference: Michael W. Berry and Murray Browne. Understanding Search Engines: Mathematical Modeling and Text Retrieval.
Characterization of Secondary Structure of Proteins using Different Vocabularies Madhavi K. Ganapathiraju Language Technologies Institute Advisors Raj.
Singular Value Decomposition in Text Mining Ram Akella University of California Berkeley Silicon Valley Center/SC Lecture 4b February 9, 2011.
TFIDF-space  An obvious way to combine TF-IDF: the coordinate of document in axis is given by  General form of consists of three parts: Local weight.
Minimum Spanning Trees Displaying Semantic Similarity Włodzisław Duch & Paweł Matykiewicz Department of Informatics, UMK Toruń School of Computer Engineering,
Lecture 21 SVD and Latent Semantic Indexing and Dimensional Reduction
1/ 30. Problems for classical IR models Introduction & Background(LSI,SVD,..etc) Example Standard query method Analysis standard query method Seeking.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 18: Latent Semantic Indexing 1.
Information Retrieval: Models and Methods October 15, 2003 CMSC Gina-Anne Levow.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining - revision Martin Russell.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006
Lecture 20 SVD and Its Applications Shang-Hua Teng.
Probabilistic Latent Semantic Analysis
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
E.G.M. PetrakisDimensionality Reduction1  Given N vectors in n dims, find the k most important axes to project them  k is user defined (k < n)  Applications:
DATA MINING LECTURE 7 Dimensionality Reduction PCA – SVD
Other IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models boolean vector.
Chapter 2 Dimensionality Reduction. Linear Methods
Linear Algebra Review 1 CS479/679 Pattern Recognition Dr. George Bebis.
1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.
Latent Semantic Indexing Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
2010 © University of Michigan Latent Semantic Indexing SI650: Information Retrieval Winter 2010 School of Information University of Michigan 1.
NL Question-Answering using Naïve Bayes and LSA By Kaushik Krishnasamy.
Information Retrieval by means of Vector Space Model of Document Representation and Cascade Neural Networks Igor Mokriš, Lenka Skovajsová Institute of.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
CpSc 881: Information Retrieval. 2 Recall: Term-document matrix This matrix is the basis for computing the similarity between documents and queries. Today:
Weighting and Matching against Indices. Zipf’s Law In any corpus, such as the AIT, we can count how often each word occurs in the corpus as a whole =
Pseudo-supervised Clustering for Text Documents Marco Maggini, Leonardo Rigutini, Marco Turchi Dipartimento di Ingegneria dell’Informazione Università.
Katrin Erk Vector space models of word meaning. Geometric interpretation of lists of feature/value pairs In cognitive science: representation of a concept.
Generic text summarization using relevance measure and latent semantic analysis Gong Yihong and Xin Liu SIGIR, April 2015 Yubin Lim.
Latent Semantic Indexing: A probabilistic Analysis Christos Papadimitriou Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala.
June 5, 2006University of Trento1 Latent Semantic Indexing for the Routing Problem Doctorate course “Web Information Retrieval” PhD Student Irina Veredina.
SINGULAR VALUE DECOMPOSITION (SVD)
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.
Latent Semantic Indexing and Probabilistic (Bayesian) Information Retrieval.
Modern information retreival Chapter. 02: Modeling (Latent Semantic Indexing)
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Link Distribution on Wikipedia [0407]KwangHee Park.
Recuperação de Informação B Cap. 02: Modeling (Latent Semantic Indexing & Neural Network Model) 2.7.2, September 27, 1999.
Natural Language Processing Topics in Information Retrieval August, 2002.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
14.0 Linguistic Processing and Latent Topic Analysis.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Information Retrieval: Models and Methods
CS479/679 Pattern Recognition Dr. George Bebis
Information Retrieval: Models and Methods
Latent Semantic Indexing
Large scale multilingual and multimodal integration
CS 430: Information Discovery
14.0 Linguistic Processing and Latent Topic Analysis
Information Retrieval and Web Search
Recuperação de Informação B
Latent semantic space: Iterative scaling improves precision of inter-document similarity measurement Rie Kubota Ando. Latent semantic space: Iterative.
Latent Semantic Analysis
Introduction to Digital Speech Processing
Presentation transcript:

10.0 Latent Semantic Analysis for Linguistic Processing References : 1. “Exploiting Latent Semantic Information in Statistical Language Modeling”, Proceedings of the IEEE, Aug “Latent Semantic Mapping”, IEEE Signal Processing Magazine, Sept. 2005, Special Issue on Speech Technology in Human-Machine Communication 3. “Golub & Van Loan, “ Matrix Computations ”, “Probabilistic Latent Semantic Indexing”, ACM Special Interest Group on Information Retrieval (ACM SIGIR), “Probabilistic Latent Semantic Indexing”, Proc. of Uncertainty in Artificial Intelligence, “Spoken Document Understanding and Organization”, IEEE Signal Processing Magazine, Sept. 2005, Special Issue on Speech Technology in Human-Machine Communication

Word-Document Matrix Representation Vocabulary V of size M and Corpus T of size N –V={w 1,w 2,...w i,..w M }, w i : the i-th word,e.g. M=2 × 10 4 T={d 1,d 2,...d j,..d N }, d j : the j-th document,e.g. N=10 5 –c ij : number of times w i occurs in d j n j : total number of words present in d j t i = Σ j c ij : total number of times w i occurs in T – Word-Document Matrix W W = [w ij ] –each row of W is a N-dim “feature vector” for a word w i with respect to all documents d j each column of W is a M-dim “feature vector” for a document d j with respect to all words w i d 1 d d j d N w1w2wiwMw1w2wiwM w ij

Dimensionality Reduction – –dimensionality reduction: selection of R largest eigenvalues (R=800 for example) – – dimensionality reduction: selection of R largest eigenvalues T 2 2 T T s i 2 : weights (significance of the “component matrices” e’ i e’ i T ) R “concepts” or “latent semantic concepts” (i, j) element of W T W : inner product of i-th and j-th columns of W “similarity” between d i and d j T R “concepts” or “latent semantic concepts”

Singular Value Decomposition (SVD) –s i : singular values, s 1 ≥ s ≥ s R U: left singular matrix, V: right singular matrix Vectors for word w i : u i S=u i (a row) –a vector with dimentionality N reduced to a vector u i S=u i with dimentionality R –N-dimentional space defined by N documents reduced to R-dimentional space defined by R “concepts” –the R row vectors of V T, or column vectors of V, or eigenvectors {e’ 1,..e’ R }, are the R orthonormal basis for the “latent semantic space” with dimentionality R, with which u i S = u i is represented –words with similar “semantic concepts” have “closer” location in the “latent semantic space” they tend to appear in similar “types” of documents, although not necessarily in exactly the same documents d 1 d d j d N w1w2wiwMw1w2wiwM w ij = w1w2wiwMw1w2wiwM uiui U R×R s1s1 sRsR d 1 d d j d N VTVT R×N M×R vjvj T

Singular Value Decomposition (SVD) Vectors for document d j : v j S=v j (a row, or v j = S v j T for a column) –a vector with dimentionality M reduced to a vector v j S=v j with dimentionality R –M-dimentional space defined by M words reduced to R-dimentional space defined by R “concepts” –the R columns of U, or eigenvectors{e 1,...e R }, are the R orthonormal basis for the “latent semantic space” with dimensionality R, with which v j S=v j is represented –documents with similar “semantic concepts” have “closer” location in the “latent semantic space” they tend to include similar “types” of words, although not necessarily exactly the same words The Association Structure between words w i and documents d j is preserved with noisy information deleted, while the dimensionality is reduced to a common set of R “concepts” d 1 d d j d N w1w2wiwMw1w2wiwM w ij = w1w2wiwMw1w2wiwM uiui U R×R s1s1 sRsR d 1 d d j d N vjvj VTVT R×N M×R T T

Example Applications in Linguistic Processing Word Clustering –example applications: class-based language modeling, information retrieval,etc. –words with similar “semantic concepts” have “closer” location in the “latent semantic space” they tend to appear in similar “types” of documents, although not necessarily in exactly the same documents –each component in the reduced word vector u j S=u j is the “association” of the word with the corresponding “concept” –example similarity measure between two words: Document Clustering –example applications: clustered language modeling, language model adaptation, information retrieval, etc. –documents with similar “semantic concepts” have “closer” location in the “latent semantic space” they tend to include similar “types” of words, although not necessarily exactly the same words –each component on the reduced document vector v j S=v j is the “association” of the document with the corresponding “concept” –example “similarity” measure between two documents: 2 2

Example Applications in Linguistic Processing Information Retrieval –“concept matching” vs “lexical matching” : relevant documents are associated with similar “concepts”, but may not include exactly the same words –example approach: treating the query as a new document (by “folding-in”), and evaluating its “similarity” with all possible documents Fold-in –consider a new document outside of the training corpus T, but with similar language patterns or “concepts” –construct a new column d p,p>N, with respect to the M words –assuming U and S remain unchanged d p =USv p T (just as a column in W= USV T ) v p = v p S = d p T U as an R-dim representation of the new document (i.e. obtaining the projection of d p on the basis e i of U by inner product)

Integration with N-gram Language Models Language Modeling for Speech Recognition –Prob(w q |d q-1 ) w q : the q-th word in the current document to be recognized (q: sequence index) d q-1 : the recognized history in the current document v q-1 =d q-1 T U : representation of d q by v q (folded-in) –Prob(w q |d q-1 ) can be estimated by u q and v q-1 in the R-dim space –integration with N-gram Prob(w q |H q-1 ) =Prob(w q |h q-1, d q-1 ) H q-1 : history up to w q-1 h q-1 : –N-gram gives local relationships, while d q-1 gives semantic concepts –d q-1 emphasizes more the key content words, while N-gram counts all words similarly including function words v q-1 for d q-1 can be estimated iteratively –assuming the q-th word in the current document is w i (n) v q moves in the R-dim space initially, eventually settle down somewhere i-th dimentionality out of M T

Probabilistic Latent Semantic Analysis (PLSA) Exactly the same as LSA, using a set of latent topics{ }to construct a new relationship between the documents and terms, but with a probabilistic framework Trained with EM by maximizing the total likelihood : frequency count of term in the document D i : documents T k : latent topics t j : terms