600.465 - Intro to NLP - J. Eisner1 Words vs. Terms Taken from Jason Eisner’s NLP class slides: www.cs.jhu.edu/~eisner.

Slides:

Advertisements

Similar presentations

Introduction to Information Retrieval Outline ❶ Latent semantic indexing ❷ Dimensionality reduction ❸ LSI in information retrieval 1.

Advertisements

Information retrieval – LSI, pLSI and LDA

2 Information Retrieval System IR System Query String Document corpus Ranked Documents 1. Doc1 2. Doc2 3. Doc3.

Dimensionality Reduction PCA -- SVD

3D Reconstruction – Factorization Method Seong-Wook Joo KG-VISA 3/10/2004.

INF 141 IR METRICS LATENT SEMANTIC ANALYSIS AND INDEXING Crista Lopes.

From last time What’s the real point of using vector spaces?: A user’s query can be viewed as a (very) short document. Query becomes a vector in the same.

Developing a Characterization of Business Intelligence Workloads for Sizing New Database Systems Ted J. Wasserman (IBM Corp. / Queen’s University) Pat.

What is missing? Reasons that ideal effectiveness hard to achieve: 1. Users’ inability to describe queries precisely. 2. Document representation loses.

Hinrich Schütze and Christina Lioma

Linear Algebra and SVD (Some slides adapted from Octavia Camps)

1 Latent Semantic Indexing Jieping Ye Department of Computer Science & Engineering Arizona State University

Uncalibrated Geometry & Stratification Sastry and Yang

Information Retrieval in Text Part III Reference: Michael W. Berry and Murray Browne. Understanding Search Engines: Mathematical Modeling and Text Retrieval.

TFIDF-space  An obvious way to combine TF-IDF: the coordinate of document in axis is given by  General form of consists of three parts: Local weight.

1/ 30. Problems for classical IR models Introduction & Background(LSI,SVD,..etc) Example Standard query method Analysis standard query method Seeking.

IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.

The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.

Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 18: Latent Semantic Indexing 1.

Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.

© 2003 by Davi GeigerComputer Vision October 2003 L1.1 Structure-from-EgoMotion (based on notes from David Jacobs, CS-Maryland) Determining the 3-D structure.

10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos.

1 I256: Applied Natural Language Processing Marti Hearst Nov 6, 2006.

Multimedia Databases LSI and SVD. Text - Detailed outline text problem full text scanning inversion signature files clustering information filtering and.

Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.

DATA MINING LECTURE 7 Dimensionality Reduction PCA – SVD

Other IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models boolean vector.

HCC class lecture 14 comments John Canny 3/9/05. Administrivia.

Information Retrieval Latent Semantic Indexing. Speeding up cosine computation What if we could take our vectors and “pack” them into fewer dimensions.

Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka Virginia de Sa (UCSD) Cogsci 108F Linear.

Chapter 2 Dimensionality Reduction. Linear Methods

1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.

CS246 Topic-Based Models. Motivation  Q: For query “car”, will a document with the word “automobile” be returned as a result under the TF-IDF vector.

Latent Semantic Analysis Hongning Wang VS model in practice Document and query are represented by term vectors – Terms are not necessarily orthogonal.

Latent Semantic Indexing Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.

Learn to Comment Lance Lebanoff Mentor: Mahdi. Emotion classification of text  In our neural network, one feature is the emotion detected in the image.

AN ORTHOGONAL PROJECTION

On Scaling Latent Semantic Indexing for Large Peer-to-Peer Systems Chunqiang Tang, Sandhya Dwarkadas, Zhichen Xu University of Rochester; Yahoo! Inc. ACM.

Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.

CpSc 881: Information Retrieval. 2 Recall: Term-document matrix This matrix is the basis for computing the similarity between documents and queries. Today:

Katrin Erk Vector space models of word meaning. Geometric interpretation of lists of feature/value pairs In cognitive science: representation of a concept.

Generic text summarization using relevance measure and latent semantic analysis Gong Yihong and Xin Liu SIGIR, April 2015 Yubin Lim.

Latent Semantic Indexing: A probabilistic Analysis Christos Papadimitriou Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala.

Text Categorization Moshe Koppel Lecture 12:Latent Semantic Indexing Adapted from slides by Prabhaker Raghavan, Chris Manning and TK Prasad.

June 5, 2006University of Trento1 Latent Semantic Indexing for the Routing Problem Doctorate course “Web Information Retrieval” PhD Student Irina Veredina.

SINGULAR VALUE DECOMPOSITION (SVD)

Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.

1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 6. Dimensionality Reduction.

Latent Semantic Indexing and Probabilistic (Bayesian) Information Retrieval.

Modern information retreival Chapter. 02: Modeling (Latent Semantic Indexing)

1 Latent Concepts and the Number Orthogonal Factors in Latent Semantic Analysis Georges Dupret

Introduction to Linear Algebra Mark Goldman Emily Mackevicius.

Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.

Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center.

Words vs. Terms Intro to NLP - J. Eisner.

Recuperação de Informação B Cap. 02: Modeling (Latent Semantic Indexing & Neural Network Model) 2.7.2, September 27, 1999.

DATA MINING LECTURE 8 Sequence Segmentation Dimensionality Reduction.

Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

From Frequency to Meaning: Vector Space Models of Semantics

Information Retrieval: Models and Methods

Review of Linear Algebra

Information Retrieval: Models and Methods

Latent Semantic Indexing

LSI, SVD and Data Management

HCC class lecture 13 comments

Words vs. Terms Intro to NLP - J. Eisner.

Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 – 14, Tuesday 8th November 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR)

Lecture 13: Singular Value Decomposition (SVD)

Restructuring Sparse High Dimensional Data for Effective Retrieval

Presentation transcript:

Intro to NLP - J. Eisner1 Words vs. Terms Taken from Jason Eisner’s NLP class slides:

Intro to NLP - J. Eisner2 Latent Semantic Analysis  A trick from Information Retrieval  Each document in corpus is a length-k vector  Or each paragraph, or whatever (0, 3, 3, 1, 0, 7,... 1, 0) aardvark abacus abbot abduct above zygote zymurgy abandoned a single document

Intro to NLP - J. Eisner3 Latent Semantic Analysis  A trick from Information Retrieval  Each document in corpus is a length-k vector  Plot all documents in corpus True plot in k dimensionsReduced-dimensionality plot

Intro to NLP - J. Eisner4 Latent Semantic Analysis  Reduced plot is a perspective drawing of true plot  It projects true plot onto a few axes   a best choice of axes – shows most variation in the data.  Found by linear algebra: “Singular Value Decomposition” (SVD) True plot in k dimensionsReduced-dimensionality plot

Intro to NLP - J. Eisner5 Latent Semantic Analysis  SVD plot allows best possible reconstruction of true plot (i.e., can recover 3-D coordinates with minimal distortion)  Ignores variation in the axes that it didn’t pick  Hope that variation’s just noise and we want to ignore it True plot in k dimensionsReduced-dimensionality plot word 1 word 2 word 3 theme A theme B theme A theme B

Intro to NLP - J. Eisner6 Latent Semantic Analysis  SVD finds a small number of theme vectors  Approximates each doc as linear combination of themes  Coordinates in reduced plot = linear coefficients  How much of theme A in this document? How much of theme B?  Each theme is a collection of words that tend to appear together True plot in k dimensionsReduced-dimensionality plot theme A theme B theme A theme B

Intro to NLP - J. Eisner7 Latent Semantic Analysis  New coordinates might actually be useful for Info Retrieval  To compare 2 documents, or a query and a document:  Project both into reduced space: do they have themes in common?  Even if they have no words in common! True plot in k dimensionsReduced-dimensionality plot theme A theme B theme A theme B

Intro to NLP - J. Eisner8 Latent Semantic Analysis  Themes extracted for IR might help sense disambiguation  Each word is like a tiny document: (0,0,0,1,0,0,…)  Express word as a linear combination of themes  Each theme corresponds to a sense?  E.g., “Jordan” has Mideast and Sports themes (plus Advertising theme, alas, which is same sense as Sports)  Word’s sense in a document: which of its themes are strongest in the document?  Groups senses as well as splitting them  One word has several themes and many words have same theme

Intro to NLP - J. Eisner9 Latent Semantic Analysis  Another perspective (similar to neural networks): documents terms matrix of strengths (how strong is each term in each document?) Each connection has a weight given by the matrix.

Intro to NLP - J. Eisner10 Latent Semantic Analysis  Which documents is term 5 strong in? docs 2, 5, 6 light up strongest. documents terms

Intro to NLP - J. Eisner11 This answers a query consisting of terms 5 and 8! really just matrix multiplication: term vector (query) x strength matrix = doc vector. Latent Semantic Analysis  Which documents are terms 5 and 8 strong in? documents terms

Intro to NLP - J. Eisner12 Latent Semantic Analysis  Conversely, what terms are strong in document 5? gives doc 5’s coordinates! documents terms

Intro to NLP - J. Eisner13 Latent Semantic Analysis  SVD approximates by smaller 3-layer network  Forces sparse data through a bottleneck, smoothing it documents terms documents terms themes

Intro to NLP - J. Eisner14 Latent Semantic Analysis  I.e., smooth sparse data by matrix approx: M  A B  A encodes camera angle, B gives each doc’s new coords documents terms matrix M A B documents terms themes

Intro to NLP - J. Eisner15 Latent Semantic Analysis Completely symmetric! Regard A, B as projecting terms and docs into a low-dimensional “theme space” where their similarity can be judged. documents terms matrix M A B documents terms themes

Intro to NLP - J. Eisner16 Latent Semantic Analysis  Completely symmetric. Regard A, B as projecting terms and docs into a low-dimensional “theme space” where their similarity can be judged.  Cluster documents (helps sparsity problem!)  Cluster words  Compare a word with a doc  Identify a word’s themes with its senses  sense disambiguation by looking at document’s senses  Identify a document’s themes with its topics  topic categorization

Intro to NLP - J. Eisner17 If you’ve seen SVD before … documents terms documents terms  SVD actually decomposes M = A D B’ exactly  A = camera angle (orthonormal); D diagonal; B’ orthonormal matrix M A B’ D

Intro to NLP - J. Eisner18 If you’ve seen SVD before … documents terms documents terms  Keep only the largest j < k diagonal elements of D  This gives best possible approximation to M using only j blue units matrix M A B’ D

Intro to NLP - J. Eisner19 If you’ve seen SVD before … documents terms documents terms  Keep only the largest j < k diagonal elements of D  This gives best possible approximation to M using only j blue units matrix M A B’ D

Intro to NLP - J. Eisner20 If you’ve seen SVD before … documents terms documents terms  To simplify picture, can write M  A (DB’) = AB matrix M A B = DB’  How should you pick j (number of blue units)?  Just like picking number of clusters:  How well does system work with each j (on held-out data)?