Word Meaning and Similarity

Slides:



Advertisements
Similar presentations
Chapter 4: Probability. LO1Describe what probability is and when one would use it. LO2Differentiate among three methods of assigning probabilities: the.
Advertisements

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
2 Information Retrieval System IR System Query String Document corpus Ranked Documents 1. Doc1 2. Doc2 3. Doc3.
Lecture 11 Search, Corpora Characteristics, & Lucene Introduction.
SI485i : NLP Set 11 Distributional Similarity slides adapted from Dan Jurafsky and Bill MacCartney.
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model.
Katrin Erk Distributional models. Representing meaning through collections of words Doc 1: Abdullah boycotting challenger commission dangerous election.
Word Sense Disambiguation Ling571 Deep Processing Techniques for NLP February 28, 2011.
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Database Management Systems, R. Ramakrishnan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides.
ISP 433/633 Week 10 Vocabulary Problem & Latent Semantic Indexing Partly based on G.Furnas SI503 slides.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.
Word Meaning and Similarity
Vector Semantics Introduction. Dan Jurafsky Why vector models of meaning? computing the similarity between words “fast” is similar to “rapid” “tall” is.
Disambiguation of References to Individuals Levon Lloyd (State University of New York) Varun Bhagwan, Daniel Gruhl (IBM Research Center) Varun Bhagwan,
1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.
1 Statistical NLP: Lecture 10 Lexical Acquisition.
11 Chapter 20 Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin.
1 Query Operations Relevance Feedback & Query Expansion.
Term Frequency. Term frequency Two factors: – A term that appears just once in a document is probably not as significant as a term that appears a number.
Lecture 22 Word Similarity Topics word similarity Thesaurus based word similarity Intro. Distributional based word similarityReadings: NLTK book Chapter.
Katrin Erk Vector space models of word meaning. Geometric interpretation of lists of feature/value pairs In cognitive science: representation of a concept.
1 Computing Relevance, Similarity: The Vector Space Model.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.
Outline Problem Background Theory Extending to NLP and Experiment
Motivation  Methods of local analysis extract information from local set of documents retrieved to expand the query  An alternative is to expand the.
CIS 530 Lecture 2 From frequency to meaning: vector space models of semantics.
Information Retrieval Techniques MS(CS) Lecture 7 AIR UNIVERSITY MULTAN CAMPUS Most of the slides adapted from IIR book.
Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book.
1 CS 430: Information Discovery Lecture 8 Automatic Term Extraction and Weighting.
NLP.
Vector Semantics.
Information Retrieval and Web Search IR models: Vector Space Model Term Weighting Approaches Instructor: Rada Mihalcea.
Vector Semantics Dense Vectors.
IR 6 Scoring, term weighting and the vector space model.
From Frequency to Meaning: Vector Space Models of Semantics
Vector Semantics. Dan Jurafsky Why vector models of meaning? computing the similarity between words “fast” is similar to “rapid” “tall” is similar to.
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
Semantic Processing with Context Analysis
Lecture 12: Relevance Feedback & Query Expansion - II
Lecture 24 Distributional Word Similarity II
Vector Semantics Introduction.
Many slides from Rada Mihalcea (Michigan), Paul Tarau (U.North Texas)
Lecture 15: Text Classification & Naive Bayes
CSC 594 Topics in AI – Natural Language Processing
Ranking in IR and WWW Modern Information Retrieval: A Brief Overview
Vector-Space (Distributional) Lexical Semantics
Efficient Estimation of Word Representation in Vector Space
Statistical NLP: Lecture 9
Representation of documents and queries
HCC class lecture 13 comments
From frequency to meaning: vector space models of semantics
Word Embedding Word2Vec.
Relevance Feedback & Query Expansion
CS 430: Information Discovery
Lecture 22 Word Similarity
Automatic Global Analysis
CS 430: Information Discovery
Dennis Zhao,1 Dragomir Radev PhD1 LILY Lab
Diagnostics and Remedial Measures
Math review - scalars, vectors, and matrices
Statistical NLP : Lecture 9 Word Sense Disambiguation
Term Frequency–Inverse Document Frequency
Statistical NLP: Lecture 10
CS 430: Information Discovery
Presentation transcript:

Word Meaning and Similarity Word Similarity: Distributional Similarity (I)

Problems with thesaurus-based meaning We don’t have a thesaurus for every language Even if we do, they have problems with recall Many words are missing Most (if not all) phrases are missing Some connections between senses are missing Thesauri work less well for verbs, adjectives Adjectives and verbs have less structured hyponymy relations

Distributional models of meaning Also called vector-space models of meaning Offer much higher recall than hand-built thesauri Although they tend to have lower precision Also called vector-space models of meaning Offer much higher recall than hand-built thesauri Although they tend to have lower precision Zellig Harris (1954): “oculist and eye-doctor … occur in almost the same environments…. If A and B have almost identical environments we say that they are synonyms. Firth (1957): “You shall know a word by the company it keeps!”

Intuition of distributional word similarity Nida example: A bottle of tesgüino is on the table Everybody likes tesgüino Tesgüino makes you drunk We make tesgüino out of corn. From context words humans can guess tesgüino means an alcoholic beverage like beer Intuition for algorithm: Two words are similar if they have similar word contexts.

Reminder: Term-document matrix Each cell: count of term t in a document d: tft,d: Each document is a count vector in ℕv: a column below

Reminder: Term-document matrix Two documents are similar if their vectors are similar

The words in a term-document matrix Each word is a count vector in ℕD: a row below

The words in a term-document matrix Two words are similar if their vectors are similar

The Term-Context matrix Instead of using entire documents, use smaller contexts Paragraph Window of 10 words A word is now defined by a vector over counts of context words

Sample contexts: 20 words (Brown corpus) equal amount of sugar, a sliced lemon, a tablespoonful of apricot preserve or jam, a pinch each of clove and nutmeg, on board for their enjoyment. Cautiously she sampled her first pineapple and another fruit whose taste she likened to that of of a recursive type well suited to programming on the digital computer. In finding the optimal R-stage policy from that of substantially affect commerce, for the purpose of gathering data and information necessary for the study authorized in the first section of this

Term-context matrix for word similarity Two words are similar in meaning if their context vectors are similar

Should we use raw counts? For the term-document matrix We used tf-idf instead of raw term counts For the term-context matrix Positive Pointwise Mutual Information (PPMI) is common

Pointwise Mutual Information Do events x and y co-occur more than if they were independent? PMI between two words: (Church & Hanks 1989) Do words x and y co-occur more than if they were independent? Positive PMI between two words (Niwa & Nitta 1994) Replace all PMI values less than 0 with zero

Computing PPMI on a term-context matrix Matrix F with W rows (words) and C columns (contexts) fij is # of times wi occurs in context cj

p(w=information,c=data) = p(w=information) = p(c=data) = 6/19 = .32 11/19 = .58 7/19 = .37

pmi(information,data) = log2 ( .32 / (.37*.58) ) = .58 (.57 using full precision)

Weighing PMI PMI is biased toward infrequent events Various weighting schemes help alleviate this See Turney and Pantel (2010) Add-one smoothing can also help

Word Meaning and Similarity Word Similarity: Distributional Similarity (I)

Word Meaning and Similarity Word Similarity: Distributional Similarity (II)

Using syntax to define a word’s context Zellig Harris (1968) “The meaning of entities, and the meaning of grammatical relations among them, is related to the restriction of combinations of these entities relative to other entities” Two words are similar if they have similar parse contexts Duty and responsibility (Chris Callison-Burch’s example) Modified by adjectives additional, administrative, assumed, collective, congressional, constitutional … Objects of verbs assert, assign, assume, attend to, avoid, become, breach …

Co-occurrence vectors based on syntactic dependencies Dekang Lin, 1998 “Automatic Retrieval and Clustering of Similar Words” The contexts C are different dependency relations Subject-of- “absorb” Prepositional-object of “inside” Counts for the word cell:

PMI applied to dependency relations Hindle, Don. 1990. Noun Classification from Predicate-Argument Structure. ACL Object of “drink” Count PMI tea 2 11.8 liquid 10.5 wine 9.3 anything 3 5.2 it 1.3 Object of “drink” Count PMI it 3 1.3 anything 5.2 wine 2 9.3 tea 11.8 liquid 10.5 “Drink it” more common than “drink wine” But “wine” is a better “drinkable” thing than “it”

Reminder: cosine for computing similarity Sec. 6.3 Reminder: cosine for computing similarity Dot product Unit vectors vi is the PPMI value for word v in context i wi is the PPMI value for word w in context i. Cos(v,w) is the cosine similarity of v and w See Law of Cosines (Cosine Rule) wikipedia page

Cosine as a similarity metric -1: vectors point in opposite directions +1: vectors point in same directions 0: vectors are orthogonal Raw frequency or PPMI are non-negative, so cosine range 0-1

Which pair of words is more similar? cosine(apricot,information) = large data computer apricot 1 digital 2 information 6 Which pair of words is more similar? cosine(apricot,information) = cosine(digital,information) = cosine(apricot,digital) =

Other possible similarity measures

Evaluating similarity (the same as for thesaurus-based) Intrinsic Evaluation: Correlation between algorithm and human word similarity ratings Extrinsic (task-based, end-to-end) Evaluation: Spelling error detection, WSD, essay grading Taking TOEFL multiple-choice vocabulary tests Levied is closest in meaning to which of these: imposed, believed, requested, correlated

Word Meaning and Similarity Word Similarity: Distributional Similarity (II)