Latent Semantic Indexing Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Slides:

Advertisements

Similar presentations

Artificial Neural Networks (1)

Advertisements

Covariance Matrix Applications

Support Vector Machines

Dimensionality Reduction PCA -- SVD

©2003 Paula Matuszek CSC 9010: Text Mining Applications Document-Level Techniques Dr. Paula Matuszek (610)

Comparison of information retrieval techniques: Latent semantic indexing (LSI) and Concept indexing (CI) Jasminka Dobša Faculty of organization and informatics,

Recognizing Textual Entailment Progress towards RTE 4 Scott Settembre University at Buffalo, SNePS Research Group

What is missing? Reasons that ideal effectiveness hard to achieve: 1. Users’ inability to describe queries precisely. 2. Document representation loses.

Latent Semantic Analysis

Hinrich Schütze and Christina Lioma

Principal Component Analysis

Student Mini-Camp Project Report Pattern Recognition Participant StudentsAffiliations Patrick ChoiClaremont Graduate University Joseph McGrathUniv. of.

1 Latent Semantic Indexing Jieping Ye Department of Computer Science & Engineering Arizona State University

Information Retrieval in Text Part III Reference: Michael W. Berry and Murray Browne. Understanding Search Engines: Mathematical Modeling and Text Retrieval.

TFIDF-space  An obvious way to combine TF-IDF: the coordinate of document in axis is given by  General form of consists of three parts: Local weight.

Indexing by Latent Semantic Analysis Scot Deerwester, Susan Dumais,George Furnas,Thomas Landauer, and Richard Harshman Presented by: Ashraf Khalil.

IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.

SVD and PCA COS 323. Dimensionality Reduction Map points in high-dimensional space to lower number of dimensionsMap points in high-dimensional space to.

The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.

SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.

Neural Networks Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos.

Multimedia Databases LSI and SVD. Text - Detailed outline text problem full text scanning inversion signature files clustering information filtering and.

Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.

Other IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models boolean vector.

Backpropagation Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Evaluation of Utility of LSA for Word Sense Discrimination Esther Levin, Mehrbod Sharifi, Jerry Ball

Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.

An Introduction to Support Vector Machines Martin Law.

Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.

Machine Learning CS 165B Spring Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks.

Chapter 2 Dimensionality Reduction. Linear Methods

Presented By Wanchen Lu 2/25/2013

1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.

Automated Essay Grading Resources: Introduction to Information Retrieval, Manning, Raghavan, Schutze (Chapter 06 and 18) Automated Essay Scoring with e-rater.

Chapter 9 Neural Network.

EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.

AN ORTHOGONAL PROJECTION

Feature selection LING 572 Fei Xia Week 4: 1/29/08 1.

CpSc 881: Information Retrieval. 2 Recall: Term-document matrix This matrix is the basis for computing the similarity between documents and queries. Today:

Latent Semantic Indexing: A probabilistic Analysis Christos Papadimitriou Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala.

An Introduction to Support Vector Machines (M. Law)

Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.

CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 28: Principal Component Analysis; Latent Semantic Analysis.

SINGULAR VALUE DECOMPOSITION (SVD)

Chapter 7 Multivariate techniques with text Parallel embedded system design lab 이청용.

1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 6. Dimensionality Reduction.

1 Latent Concepts and the Number Orthogonal Factors in Latent Semantic Analysis Georges Dupret

Introduction to Linear Algebra Mark Goldman Emily Mackevicius.

Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.

Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.

Classification Ensemble Methods 1

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.

Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.

Vector Semantics Dense Vectors.

Dimensionality Reduction and Principle Components Analysis

Multimodal Learning with Deep Boltzmann Machines

Vector-Space (Distributional) Lexical Semantics

Efficient Estimation of Word Representation in Vector Space

LSI, SVD and Data Management

Face Recognition and Detection Using Eigenfaces

Word Embedding Word2Vec.

Word embeddings (continued)

Word representations David Kauchak CS158 – Fall 2016.

Lecture 16. Classification (II): Practical Considerations

Vector Representation of Text

Latent Semantic Analysis

Presentation transcript:

Latent Semantic Indexing Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001

Administration Example analogies…

And-or Proof out(x) = g(sum k w k x k ) w 1 =10, w 2 =10, w 3 =-10 x 1 + x 2 + ~x 3 Sum for 110? Sum for 001? Generally? b=110, sum i |b i -x i | What happens if we set w 0 =10? w 0 =-15?

LSI Background Reading Landauer, Laham, Foltz (1998). Learning human-like knowledge by Singular Value Decomposition: A Progress Report. Advances in Neural Information Processing Systems 10, (pp )

Outline Linear nets, autoassociation LSI: Cross between IR and NNs

Purely Linear Network x1x1x1x1 h1h1h1h1 x2x2x2x2 x3x3x3x3 xDxDxDxD … h2h2h2h2 hkhkhkhk … out W (nxk) U (kx1)

What Does It Do? out(x) = sum j (sum i x i W ij ) U j = sum i x i (sum j W ij U j ) = sum i x i (sum j W ij U j ) x1x1x1x1 out x2x2x2x2 x3x3x3x3 xDxDxDxD … W’ (nx1) W’ i = sum j W ij U j

Can Other Layers Help? x1x1x1x1 h1h1h1h1 x2x2x2x2 x3x3x3x3 x4x4x4x4 h2h2h2h2 U (nxk) V (kxn) out 1 out 2 out 3 out 4

Autoassociator x 1 x 2 x 3 x h 1 h 2 h 1 h y 1 y 2 y 3 y

Applications Autoassociators have been used for data compression, feature discovery, and many other tasks. U matrix encodes the inputs into k features How train?

SVD Singular value decomposition provides another method, from linear algebra. Training data M is nxm (input features by examples) M = U  2 k V T U T U = I, V T V = I,  diagonal

Dimension Reduction Finds least squares best U (nxk, free k) Rows of U map input features to encoded features (instance is sum) Closely related to symm. eigenvalue decomposition,symm. eigenvalue decomposition, factor analysisfactor analysis principle component analysisprinciple component analysis Subroutine in many math packages.

SVD Applications Eigenfaces EigenfacesHandwriting recognition recognitionTextapplications…

LSI/LSA Latent semantic indexing is the application of SVD to IR. Latent semantic analysis is the more general term. Features are words, examples are text passages. Latent: Not visible on the surface Semantic: Word meanings

Running LSI Learns new word representations! Trained on: 20,000-60,000 words20,000-60,000 words 1,000-70,000 passages1,000-70,000 passages Use k= hidden units Similarity between vectors computed as cosine.

Step by Step 1. M ij rows are words, columns are passages: filled w/ counts 2. Transformation of matrix: 3. SVD computed: M=U  V T 4. Best k components of rows of U kept as word representations. log(M ij +1) -sum j ((M ij /sum j M ij )log(M ij /sum j M ij )

Geometric View Words embedded in high-d space. exam test fish

Comparison to VSM A:The feline climbed upon the roof B:A cat leapt onto a house C:The final will be on a Thursday How similar? Vector space model: sim(A,B)=0Vector space model: sim(A,B)=0 LSI: sim(A,B)=.49>sim(A,C)=.45LSI: sim(A,B)=.49>sim(A,C)=.45 Non-zero sim with no words in common by overlap in reduced representation.

What Does LSI Do? Let’s send it to school…

Plato’s Problem 7 th grader learns new words today, fewer than 1 by direct instruction. Perhaps 3 were even encountered. How can this be? Plato: You already knew them. LSA: Many weak relationships combined (data to back it up!) Rate comparable to students.

Vocabulary TOEFL synonym test Choose alternative with highest similarity score. LSA correct on 64% of 80 items. Matches avg applicant to US college. Mistakes correlate w/ people (r=.44). best solo measure of intelligence

Multiple Choice Exam Trained on psych textbook. Given same test as students. LSA 60% lower than average, but passes. Has trouble with “hard” ones.

Essay Test LSA can’t write. If you can’t do, judge. Students write essays, LSA trained on related text. Compare similarity and length with graded essays (labeled). Cosine weighted average of top 10. Regression to combine sim and len. Correlation: Better than human. Bag of words!?

Digit Representations Look at similarities of all pairs from one to nine. Look at best fit of these similarities in one dimension: they come out in order! Similar experiments with cities in Europe in two dimensions.

Word Sense The chemistry student knew this was not a good time to forget how to calculate volume and mass. heavy?.21 heavy?.21 church?.14 church?.14 LSI picks best p<.001

More Tests Antonyms just as similar as syns. (Cluster analysis separates.)Antonyms just as similar as syns. (Cluster analysis separates.) LSA correlates.50 with children and.32 with adults on word sorting (misses grammatical classification).LSA correlates.50 with children and.32 with adults on word sorting (misses grammatical classification). Priming, conjunction error: similarity correlates with strength of effectPriming, conjunction error: similarity correlates with strength of effect

Conjunction Error Linda is a young woman who is single, outspoken…deeply concerned with issues of discrimination and social justice Is Linda a feministic bank teller? Is Linda a bank teller? 80% rank former has higher. Can’t be! Pr(f bt | Linda) = Pr(bt | Linda) Pr(f | Linda, bt)

LSApplications 1.Improve IR. 2.Cross-language IR. Train on parallel collection. 3.Measure text coherency. 4.Use essays to pick educational text. 5.Grade essays. Demos at

Analogies Compare difference vectors: geometric instantiation of relationship. dog bark cow moo

LSA Motto? (AT&T Cafeteria) suckssyntax

What to Learn Single output multiple layer linear nets compute the same as single output single layer linear nets. Autoassociation finds encodings. LSI is the application of this idea to text.

Homework 10 (due 12/12) 1.Describe a procedure for converting a Boolean formula in CNF (n variables, m clauses) into an equivalent backprop network. How many hidden units does it have? 2.A key issue in LSI is picking “k”, the number of dimensions. Let’s say we had a set of 10,000 passages. Explain how we could combine the idea of cross validation and autoassociation to select a good value for k.