Presentation is loading. Please wait.

Presentation is loading. Please wait.

From Frequency to Meaning: Vector Space Models of Semantics

Similar presentations


Presentation on theme: "From Frequency to Meaning: Vector Space Models of Semantics"— Presentation transcript:

1 From Frequency to Meaning: Vector Space Models of Semantics
Peter D. Turney, Patrick Pantel

2 VSM Applications Originally developed for SMART information retrieval system Used VSMs to find documents related to a search VSM techniques used in modern search engines More recently being applied to multiple choice question answering (TOEFL, SAT)

3 Why VSMs? Extract knowledge automatically from corpora, so don't require hand coded knowledge or ontologies Word similarity tasks typically require large lexicons (like WordNet), which are much harder to come by the large corpora VSMs are successful in measuring similarity of meaning between words, phrases, and documents

4 Types of VSMs Types of matrices: Term-document Word-context
Pair-pattern

5 Term-Document VSM Row Vectors : Terms Column Vectors : Documents

6 Term-Document VSM Relies upon the bag of words hypothesis:
-The frequency of words in a document tend to indicate the relevance of the document to a query. The column vector in a term-document matrix tends to indicate what the document is about.

7 Word-Context VSM Rows : Words Columns : Context
Essentially the same as term-documents, but focused on the row vectors

8 Word-Context VSM Relies upon the distributional hypothesis.
-Words that occur in similar contexts tend to have similar meanings. Row vectors in a word-context matrix indicate similar word meanings.

9 Pair-Pattern VSM Rows : pairs of words (mason:stone, carpenter:wood)
Columns : patterns (“X cuts Y”, “X works with Y”) X cuts Y | X works with Y | X breaks Y mason:stone carpenter:wood mason:wood

10 Pair-Pattern VSM Similarity of column in a pair-pattern matrix indicates similarity of the patterns. Extended distributional hypothesis : Patterns that co-occur with similar pairs tend to have similar meanings. The pattern “X solves Y” and “Y is solved by X” tend to co-occur with the same pairs, which indicates that they have similar meanings.

11 Pair-Pattern VSM Similarity of rows in a pair-pattern matrix indicate similarity of word pairs. Latent relation hypothesis: Pairs of words that co-occur in similar patterns tend to have similar semantic relations. Inverse of previous hypothesis.

12 Linguistic Processing
Tokenization Normalization Annotation

13 Mathematical Processing
Generate frequencies from corpora Adjust weights of elements in matrix Smoothing Measure similarity of vectors

14 Building Frequency Matrix
Scan through corpus, recording events and frequencies into a hash table/database. Use resulting data structure to build a frequency matrix.

15 Weighting Elements Give more weight to surprising events and less weight to expected events Surprising events are more discriminative of semantic similarity In the context of 'rat' and 'mouse', 'dissect' and 'exterminate' are much more discriminative than 'have' and 'like'

16 Weighting Elements Term frequency * inverse document frequency
Element gets a high weight if the corresponding term occurs frequently in the corresponding document, but rare in other documents. Pointwise Mutual Information (PMI) Length normalization Long documents tend to be favored, which is correct by length normalization Term weighting Terms like 'hostage' and 'hostages' tend to be correlated but not normalized to the same term, so their weights are reduced when the co-occur in a document.

17 Smoothing Improve performance by limiting the number of vector components Vectors that do not share non-zero coordinates are unrelated, and can be ignored. Smoothing algorithms remove elements below certain weights to zero and keeps elements that are more relevant.

18 Singular Value Decomposition
Decomposes the matrix into three matrices (U,Ʃ,V), where two are in column orthonormal form and the third is a diagonal matrix of singular values. Ʃ : Diagonal matrix of top k values. U, V : Corresponding columns

19 Comparing the Vectors Most popular way : take the cosine
Vectors of frequent words tend to be long, while rare word vectors tend to be short. Cosine ensures vector length is irrelevant to their similarity, as it measures only the angle between them.

20 Efficient Comparisons
Sparse Matrix Multiplication -Throw out vectors that don't share non-zero coordinates Distributed Implementation using MapReduce Randomized Algorithms -Scale large vectors to small vectors while losing minimal information

21 Implementations Term-Document Matrix : Lucene
Content – Webpages, PDF documents, images, video, etc Fields – Developer defined parts of these Content elements. Columns : Documents Fields : Rows

22 Implementations Word-Context Matrix Semantic Vectors
Open-source project to implement VSMs and random projection to measure word similarity. Uses Lucene to create a term-document matrix, and uses random projection to reduce dimensions.

23 Implementations Pair-Pattern Matrix Latent Relational Analysis
Builds pair-pattern matrix using a textual corpus as input. Uses WordNet to mitigate spareness: <Korea, Japan> expanded to <South Korea, Japan>, <Republic of Korea, Japan>, <Korea, Nippon>, etc

24 Applications Term-Document: Document Retrieval Document Clustering
Document Classification Essay grading Document segmentation QA Call Routing

25 Applications Word-Context: Word similarity Word clustering
Word classification Automatic thesaurus generation WSD Context-sensitive spelling correction SRL Query expansion Textual advertising Information Extraction/NER

26 Applications Pair-pattern: Relational similarity Pattern similarity
Relational clustering Relational classification Relational search Automatic thesaurus generation Analogical mapping

27 The Future VSMs typically don't account for word order (pair- pattern matrices do). Some people (Clark and Pulman 2007, Widdows and Ferraro 2008) are working on handling word order in VSMs.


Download ppt "From Frequency to Meaning: Vector Space Models of Semantics"

Similar presentations


Ads by Google