Evaluation of Utility of LSA for Word Sense Discrimination Esther Levin, Mehrbod Sharifi, Jerry Ball

Evaluation of Utility of LSA for Word Sense Discrimination Esther Levin, Mehrbod Sharifi, Jerry Ball http://www-cs.ccny.cuny.edu/~esther/research/lsa/

2 Outline Latent Semantic Analysis (LSA) Word sense discrimination through Context Group Discrimination Paradigm Experiments  Sense-based clusters (supervised learning)  K-means clustering (unsupervised learning)  Homonyms vs. Polysemes Conclusions

3 Latent Semantic Analysis (LSA) Deerwester ’90 Represents words and passages as vectors in the same (low-dimensional) semantic space Similarity in word meaning is defined by similarity of their contexts.

4 LSA Steps 1. Document-Term Co-occurrence Matrix e.g., 1151 documents X 5793 terms 2. Compute SVD 3. Reduce dimension by taking k largest singular values 4. Compute the new vector representations for documents 5. [Our Research] Clustering the new context vectors

5 Context Vectors of an ambiguous word Inducing senses of ambiguous words from their contextual similarity Context Group Discrimination Paradigm Shutze ’98

6 a b a < b Sense 1Sense 2 Context Group Discrimination Paradigm Shutze ’98 1. Cluster the context vectors 2. Compute the centroids (sense vectors) 3. Classify new contexts based on distance to centroids

Experiments

8 Experimental Setup Corpus – Leacock `93  Line (3 senses – 1151 instances)  Hard (2 senses – 752 instances)  Serve (2 senses – 1292 instances)  Interest (3 senses – 2113 instances) Context size: full document (small paragraph) Number of clusters = Number of senses

9 Research Objective How well the different senses of ambiguous words are separated in the LSA-based vector space. Parameters:  Dimensionality of LSA representation  Distance measure L1: City Block L2: Squared Euclidean Cosine

10 Sense-based Clusters An instance of supervised learning An upper bound on unsupervised performance of K-means or EM Not influenced by the choice of clustering algorithm Best Case Separation Worst Case Separation

11 Training: Finding sense vectors based on 90% of data Testing: Assigning the 10% remaining data to the closest sense vectors and evaluate by comparing this assignment to sense tags Random selection, cross validation Sense-based Clusters: Accuracy

12 Evaluating Clustering Quality: Tightness and Separation Dispersion: Inter-cluster (K-Means minimizes) Silhouette: Intra-cluster a(i): average distance of point i to all other points in the same cluster b(i): average distance of point i to the points in closest cluster

13 More on Silhouette Value i Closest Cluster Points are perfectly clustered Points can belong one cluster or another Points belong to wrong cluster a(i) average of all blue lines b(i) average of all yellow lines

14 Cosine0.9639 L10.7355 L20.9271 Cosine-0.0876 L1-0.0504 L2-0.0879 Average Silhouette Value Evaluating Clustering Quality: Tightness and Separation

15 Sense-based Clusters: Discrimination Accuracy Baseline: Percentage of the majority sense

16 Sense-based Clusters: Average Silhouette Value

17 Sense-based Clusters: Results Good discrimination accuracy Low silhouette value How is that possible?

18 Unsupervised Learning with K-means Cosine measure Start randomlyMost compact resultStart with sense vector Sense-based clustering Training/Testing

19 Unsupervised Learning with K-means

20 Polysemes vs. Homonyms Polysemes: words with multiple related meanings Homonyms: words with the same spelling but completely different meaning

21 Pseudo Words as Homonyms Shutze ’98 … find it hard to believe … … exactly how to say a line and … … about 30 minutes and serve warm … … set the interest rate on the … … find it x to believe … … exactly how to say a x and … … about 30 minutes and x warm … … set the x rate on the …

22 Dimensions (Pseudo Words) Polysemes vs. Homonyms: In LSA Space The correlation between compactness of clusters and discrimination accuracy is higher for homonyms than polysemes Points on red lines are the most compact cluster out of 10 experiments

23 Conclusions Good unsupervised sense discrimination performance for homonyms Major deterioration in sense discrimination of polysemes in absence of supervision Dimensionality reduction benefit is computational only (no peak in performance) Cosine measure performs better than L1 and L2

Evaluation of Utility of LSA for Word Sense Discrimination Esther Levin, Mehrbod Sharifi, Jerry Ball

Similar presentations

Presentation on theme: "Evaluation of Utility of LSA for Word Sense Discrimination Esther Levin, Mehrbod Sharifi, Jerry Ball"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Evaluation of Utility of LSA for Word Sense Discrimination Esther Levin, Mehrbod Sharifi, Jerry Ball

Similar presentations

Presentation on theme: "Evaluation of Utility of LSA for Word Sense Discrimination Esther Levin, Mehrbod Sharifi, Jerry Ball"— Presentation transcript:

Similar presentations

About project

Feedback