CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 28: Principal Component Analysis; Latent Semantic Analysis.

Slides:

Advertisements

Similar presentations

Eigen Decomposition and Singular Value Decomposition

Advertisements

Eigen Decomposition and Singular Value Decomposition

3D Geometry for Computer Graphics

Covariance Matrix Applications

Dimensionality Reduction PCA -- SVD

Comparison of information retrieval techniques: Latent semantic indexing (LSI) and Concept indexing (CI) Jasminka Dobša Faculty of organization and informatics,

What is missing? Reasons that ideal effectiveness hard to achieve: 1. Users’ inability to describe queries precisely. 2. Document representation loses.

Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

Principal Component Analysis

Hinrich Schütze and Christina Lioma

LISA Short Course Series Multivariate Analysis in R Liang (Sally) Shan March 3, 2015 LISA: Multivariate Analysis in RMar. 3, 2015.

Principal Component Analysis

Computer Graphics Recitation 5.

1 Latent Semantic Indexing Jieping Ye Department of Computer Science & Engineering Arizona State University

TFIDF-space  An obvious way to combine TF-IDF: the coordinate of document in axis is given by  General form of consists of three parts: Local weight.

1/ 30. Problems for classical IR models Introduction & Background(LSI,SVD,..etc) Example Standard query method Analysis standard query method Seeking.

IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.

The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.

Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 18: Latent Semantic Indexing 1.

Ordinary least squares regression (OLS)

Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.

E.G.M. PetrakisDimensionality Reduction1  Given N vectors in n dims, find the k most important axes to project them  k is user defined (k < n)  Applications:

DATA MINING LECTURE 7 Dimensionality Reduction PCA – SVD

1 CS 430 / INFO 430 Information Retrieval Lecture 9 Latent Semantic Indexing.

CS623: Introduction to Computing with Neural Nets (lecture-10) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.

Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])

Machine Learning CS 165B Spring Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks.

Chapter 2 Dimensionality Reduction. Linear Methods

1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.

CS246 Topic-Based Models. Motivation  Q: For query “car”, will a document with the word “automobile” be returned as a result under the TF-IDF vector.

Automated Essay Grading Resources: Introduction to Information Retrieval, Manning, Raghavan, Schutze (Chapter 06 and 18) Automated Essay Scoring with e-rater.

Day 1 Eigenvalues and Eigenvectors

Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.

Eigen Decomposition Based on the slides by Mani Thomas Modified and extended by Longin Jan Latecki.

CpSc 881: Information Retrieval. 2 Recall: Term-document matrix This matrix is the basis for computing the similarity between documents and queries. Today:

Latent Semantic Indexing: A probabilistic Analysis Christos Papadimitriou Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala.

Text Categorization Moshe Koppel Lecture 12:Latent Semantic Indexing Adapted from slides by Prabhaker Raghavan, Chris Manning and TK Prasad.

June 5, 2006University of Trento1 Latent Semantic Indexing for the Routing Problem Doctorate course “Web Information Retrieval” PhD Student Irina Veredina.

CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.

SINGULAR VALUE DECOMPOSITION (SVD)

Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.

CpSc 881: Machine Learning PCA and MDS. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources.

1 Latent Concepts and the Number Orthogonal Factors in Latent Semantic Analysis Georges Dupret

Introduction to Linear Algebra Mark Goldman Emily Mackevicius.

Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;

CS621 : Artificial Intelligence

Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.

Principle Component Analysis and its use in MA clustering Lecture 12.

ITCS 6265 Information Retrieval & Web Mining Lecture 16 Latent semantic indexing Thanks to Thomas Hofmann for some slides.

CS623: Introduction to Computing with Neural Nets (lecture-16) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.

Web Search and Data Mining Lecture 4 Adapted from Manning, Raghavan and Schuetze.

Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Principal Components Analysis ( PCA)

Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.

Unsupervised Learning II Feature Extraction

Unsupervised Learning II Feature Extraction

An Image Database Retrieval Scheme Based Upon Multivariate Analysis and Data Mining Presented by C.C. Chang Dept. of Computer Science and Information.

Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.

CS621: Artificial Intelligence Lecture 17: Feedforward network (lecture 16 was on Adaptive Hypermedia: Debraj, Kekin and Raunak) Pushpak Bhattacharyya.

Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.

Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.

Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.

Principal Component Analysis

CS623: Introduction to Computing with Neural Nets (lecture-15)

Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.

Eigen Decomposition Based on the slides by Mani Thomas

Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.

Information Retrieval and Web Search

Eigen Decomposition Based on the slides by Mani Thomas

Latent Semantic Analysis

Presentation transcript:

CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 28: Principal Component Analysis; Latent Semantic Analysis

Desired Features of the Search Engines Meaning based –More relevant results Multilingual –Query in English, e.g. –Fetch document in Hindi, e.g. –Show it in English

Precision (P) and Recall (R) Tradeoff between P and R Actual (A) Obtained (O) Intersection: shaded area (S) P= S/O R= S/A

Impediments to Good P and R Synonymy: A word in the document will not match its synonym in the query, bringing down Recall –E.g., “Planes for Bangkok”: query –“Flights to Bangkok”: text in the document Polysemy: A word in the query will bring up documents containing the same word, but used in a different sense, bringing down Precision –E.g., “Planes for Bangkok”: query –“Cartesian Planes”: text in the document

Principal Component Analysis

Eaample: IRIS Data (only 3 values out of 150) IDPetal Length (a 1 ) Petal Width (a 2 ) Sepal Length (a 3 ) Sepal Width (a 4 ) Classific ation Iris- setosa ,Iris- versicol or Iris- virginica

Training and Testing Data Training: 80% of the data; 40 from each class: total 120 Testing: Remaining 30 Do we have to consider all the 4 attributes for classification? Do we have to have 4 neurons in the input layer? Less neurons in the input layer may reduce the overall size of the n/w and thereby reduce training time It will also likely increase the generalization performance (Occam Razor Hypothesis: A simpler hypothesis (i.e., the neural net) generalizes better

The multivariate data X 1 X 2 X 3 X 4 X 5 … X p x 11 x 12 x 13 x 14 x 15 … x 1p x 21 x 22 x 23 x 24 x 25 … x 2p x 31 x 32 x 33 x 34 x 35 … x 3p x 41 x 42 x 43 x 44 x 45 … x 4p … x n1 x n2 x n3 x n4 x n5 … x np

Some preliminaries Sample mean vector: For the i th variable: µ i = (Σ n j=1 x ij )/n Variance for the i th variable: σ i 2 = [Σ n j=1 (x ij - µ i ) 2 ]/ [n-1] Sample covariance: c ab = [Σ n j=1 ((x aj - µ a )(x bj - µ b ))]/ [n-1] This measures the correlation in the data In fact, the correlation coefficient r ab = c ab / σ a σ b

Standardize the variables For each variable x ij Replace the values by y ij = (x ij - µ i )/σ i 2 Correlation Matrix

Short digression: Eigenvalues and Eigenvectors AX=λX a 11 x 1 + a 12 x 2 + a 13 x 3 + … a 1p x p =λx 1 a 21 x 1 + a 22 x 2 + a 23 x 3 + … a 2p x p =λx 2 … a p1 x 1 + a p2 x 2 + a p3 x 3 + … a pp x p =λx p Here, λs are eigenvalues and the solution For each λ is the eigenvector

Short digression: To find the Eigenvalues and Eigenvectors Solve the characteristic function det(A – λI)=0 Example: Characteristic equation (-9-λ)(-6- λ)-28=0 Real eigenvalues: -13, -2 Eigenvector of eigenvalue -13: (-1, 1) Eigenvector of eigenvalue -2: (4, 7) Verify: = λ 0 I= 0 λ

Next step in finding the PCs Find the eigenvalues and eigenvectors of R

Example 49 birds: 21 survived in a storm and 28 died. 5 body characteristics given X 1 : body length; X 2 : alar extent; X 3 : beak and head length X 4 : humerus length; X 5 : keel length Could we have predicted the fate from the body charateristic

Eigenvalues and Eigenvectors of R ComponentEigen value First Eigen- vector: V 1 V2V2 V3V3 V4V4 V5V

Which principal components are important? Total variance in the data= λ 1 + λ 2 + λ 3 + λ 4 + λ 5 = sum of diagonals of R= 5 First eigenvalue= ≈ 72% of total variance 5 Second ≈ 10.6%, Third ≈ 7.7%, Fourth ≈ 6.0% and Fifth ≈ 3.3% First PC is the most important and sufficient for studying the classification

Forming the PCs Z 1 = 0.452X X X X X 5 Z 2 = X X X X X 5 For all the 49 birds find the first two principal components This becomes the new data Classify using them

For the first bird X 1 =156, X 2 =245, X 3 =31.6, X 4 =18.5, X 5 =20.5 After standardizing Y 1 =( )/3.65=-0.54, Y 2 =( )/5.1=0.73, Y 3 =( )/0.8=0.17, Y 4 =( )/0.56=0.05, Y 5 =( )/0.99=-0.33 PC 1 for the first bird= Z 1 = 0.45X(-0.54)+ 0.46X(0.725)+0.45X(0.17)+0.47X(0.05)+0.39X(- 0.33) =0.064 Similarly, Z 2 = 0.602

Reduced Classification Data Instead of Use X1X1 X2X2 X3X3 X4X4 X5X5 49 rows Z1Z1 Z2Z2 49rows

Other Multivariate Data Analysis Procedures Factor Analysis Discriminant Analysis Cluster Analysis

Latent Semantic Analysis and Singular Value Decomposition Slides based on “Introduction to Information Retrieval”, Manning, Raghavan and Schutze, Cambridge University Press, 2008.

Term Document Matrix Terms as rows Docs as columns

Singular value Decomposition of Term vs. Document Matrix

Low Rank Approximation Given an M× N matrix C and a positive integer k, we wish to find an M× N matrix C k of rank at most k, so as to minimize the Frobenius norm of the matrix difference X = C − C k, defined to be

Example: Term-Document Matrix

Singular Values

Trucated SVD Matrix Retain only the first two Singular Values

Pros and Cons A kind of soft clustering on terms Documents pass through the LSA processing So do the queries No known efficient method of computation currently (billions of documents!) IMP: tries to capture association, a recurring notion in AI