Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multi-Prototype Vector Space Models of Word Meaning __________________________________________________________________________________________________.

Similar presentations


Presentation on theme: "Multi-Prototype Vector Space Models of Word Meaning __________________________________________________________________________________________________."— Presentation transcript:

1 Multi-Prototype Vector Space Models of Word Meaning __________________________________________________________________________________________________ _____________ Authors : Joseph Reisinger & Raymond J. Mooney REVIEW BY: NITISH GUPTA ROLL NUMBER : 10461

2 Introduction Automatically judging the degree of semantic similarity between words is an important task. It is useful in Text Classification, Information Retrieval, Textual Entailment and other language processing tasks. The empirical approach to find semantic similarity between words uses the Distributional Hypothesis i.e. that similar words appear in similar contexts. Traditionally word types are represented by a single “prototype” vector of contextual features derived from co-occurrence information. The semantic similarity is measured using some measure of vector distance.

3 Motivation The traditional vector-space models represent a word with a single “prototype” vector which is independent of context, but the meaning of a word clearly depends on context. A single vector space model is incapable of handling phenomena like Homonymy and Polysemy. This model is also incapable of handling the fact that the word meanings violate the Triangle Inequality when viewed at the level of word types. Eg. The word club is similar to both bat and association. But its similarity to the words bat and association clearly depends on the context the word club is used in.

4 Methodology

5 Image showing the methodology of obtaining clusters from different contextual appearances of the word ‘Position’. The ‘black star’ shows the centroid of the vectors as would have been computed by a single- vector model. The different clusters and colored stars show the different sense- specific prototype vectors pertaining to the different contexts in which the word ‘Position’ was used in the corpus.

6 Measuring Semantic Similarity where d(:, :) is the cosine similarity index. Given two words w and w’ the authors define two noncontextual clustered similarity metrics to measure similarity of isolated words. In AvgSim, word similarity is computed as the average similarity of all pairs of prototype vectors of the words. Since all pair of prototypes of the words contribute in AvgSim, two words are judged similar if many of their senses are similar. In MaxSim, similarity is measured as the maximum overall pairwise prototype similarities. Since only the closest pair of prototype contributes to the MaxSim, it judges the words as similar if only one of their senses is very close.

7 Experimental Evaluation The corpus used by the authors include: A snapshot of Wikipedia taken on Sept. 29 th, 2009, with Wikitext markup and articles with less than 100 words removed. The third edition of English Gigaword Corpus, with articles containing less 100 words removed. Judging Semantic Similarity

8 Predicting Near-Synonyms Here multi-prototype model’s ability to determine the most closely related word to a target word is tested. The top ‘k’ most similar words were computed for each prototype of each target word. For each prototype of each word a result from the multi-prototype vector model and one from a human is given to another human. The quality of measured from the fact that how frequently was the multi-prototype method chosen. The results show that for homonymous words the system gives excellent results as compared to polysemous words, but for the right number of clusters the polysemous words also give good results.

9 Thank You!! Question s


Download ppt "Multi-Prototype Vector Space Models of Word Meaning __________________________________________________________________________________________________."

Similar presentations


Ads by Google