Semantic similarity, vector space models and word- sense disambiguation Corpora and Statistical Methods Lecture 6.

Slides:



Advertisements
Similar presentations
Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.
Advertisements

2 Information Retrieval System IR System Query String Document corpus Ranked Documents 1. Doc1 2. Doc2 3. Doc3.
Text Similarity David Kauchak CS457 Fall 2011.
Outline What is a collocation?
LIN 3098 Corpus Linguistics Lecture 6 Albert Gatt.
SI485i : NLP Set 11 Distributional Similarity slides adapted from Dan Jurafsky and Bill MacCartney.
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Statistics MP Oakes (1998) Statistics for corpus linguistics. Edinburgh University Press.
Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Probabilistic IR Models Based on probability theory Basic idea : Given a document d and a query q, Estimate the likelihood of d being relevant for the.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Retrieval Models II Vector Space, Probabilistic.  Allan, Ballesteros, Croft, and/or Turtle Properties of Inner Product The inner product is unbounded.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Information Theory and Security
Lecture II-2: Probability Review
Albert Gatt Corpora and Statistical Methods Lecture 5.
Albert Gatt Corpora and Statistical Methods – Part 2.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
1 Statistical NLP: Lecture 10 Lexical Acquisition.
1 Information Filtering & Recommender Systems (Lecture for CS410 Text Info Systems) ChengXiang Zhai Department of Computer Science University of Illinois,
Text Classification, Active/Interactive learning.
Texture analysis Team 5 Alexandra Bulgaru Justyna Jastrzebska Ulrich Leischner Vjekoslav Levacic Güray Tonguç.
Query Operations J. H. Wang Mar. 26, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
Katrin Erk Vector space models of word meaning. Geometric interpretation of lists of feature/value pairs In cognitive science: representation of a concept.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Ranking in Information Retrieval Systems Prepared by: Mariam John CSE /23/2006.
Comparing and Ranking Documents Once our search engine has retrieved a set of documents, we may want to Rank them by relevance –Which are the best fit.
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Instance-based mapping between thesauri and folksonomies Christian Wartena Rogier Brussee Telematica Instituut.
Clustering C.Watters CS6403.
Query Suggestion. n A variety of automatic or semi-automatic query suggestion techniques have been developed  Goal is to improve effectiveness by matching.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 3. Word Association.
Vector Space Models.
Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.
CIS 530 Lecture 2 From frequency to meaning: vector space models of semantics.
Comparing Word Relatedness Measures Based on Google n-grams Aminul ISLAM, Evangelos MILIOS, Vlado KEŠELJ Faculty of Computer Science Dalhousie University,
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Link Distribution on Wikipedia [0407]KwangHee Park.
Using Game Reviews to Recommend Games Michael Meidl, Steven Lytinen DePaul University School of Computing, Chicago IL Kevin Raison Chatsubo Labs, Seattle.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 8. Text Clustering.
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Chapter 7: The Distribution of Sample Means
Query expansion COMP423. Menu Query expansion Two approaches Relevance feedback Thesaurus-based Most Slides copied from
SIMS 202, Marti Hearst Content Analysis Prof. Marti Hearst SIMS 202, Lecture 15.
VECTOR SPACE INFORMATION RETRIEVAL 1Adrienn Skrop.
Université d’Ottawa / University of Ottawa 2003 Bio 8102A Applied Multivariate Biostatistics L4.1 Lecture 4: Multivariate distance measures l The concept.
Plan for Today’s Lecture(s)
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
Statistical NLP: Lecture 7
Semantic Processing with Context Analysis
Vector-Space (Distributional) Lexical Semantics
Statistical NLP: Lecture 9
Representation of documents and queries
From frequency to meaning: vector space models of semantics
Language Model Approach to IR
Boolean and Vector Space Retrieval Models
CS 430: Information Discovery
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
Statistical NLP : Lecture 9 Word Sense Disambiguation
Latent Semantic Analysis
Statistical NLP: Lecture 10
Presentation transcript:

Semantic similarity, vector space models and word- sense disambiguation Corpora and Statistical Methods Lecture 6

Semantic similarity Part 1

Synonymy Different phonological/orthographic words highly related meanings: sofa / couch boy / lad Traditional definition: w1 is synonymous with w2 if w1 can replace w2 in a sentence, salva veritate Is this ever the case? Can we replace one word for another and keep our sentence identical?

The importance of text genre & register With near-synonyms, there are often register-governed conditions of use. E.g. naive vs gullible vs ingenuous You're so bloody gullible […] […] outside on the pavement trying to entice gullible idiots in […] You're so ingenuous. You tackle things the wrong way. The commentator's ingenuous query could just as well have been prompted […] However, it is ingenuous to suppose that peace process […] (source: BNC)

Synonymy vs. Similarity The contextual theory of synonymy: based on the work of Wittgenstein (1953), and Firth (1957) You shall know a word by the company it keeps (Firth 1957) Under this view, perfect synonyms might not exist. But words can be judged as highly similar if people put them into the same linguistic contexts, and judge the change to be slight.

Synonymy vs. similarity: example Miller & Charles 1991: Weak contextual hypothesis: The similarity of the context in which 2 words appear contributes to the semantic similarity of those words. E.g. snake is similar to [resp. synonym of] serpent to the extent that we find snake and serpent in the same linguistic contexts. It is much more likely that snake/serpent will occur in similar contexts than snake/toad NB: this is not a discrete notion of synonymy, but a continuous definition of similarity

The Miller/Charles experiment Subjects were given sentences with missing words; asked to place words they felt were OK in each context. Method to compare words A and B: find sentences containing A find sentences containing B delete A and B from sentences and shuffle them ask people to choose which sentences to place A and B in. Results: People tend to put similar words in the same context, and this is highly correlated with occurrence in similar contexts in corpora.

Issues with similarity “Similar” is a much broader concept than “synonymous”: “Contextually related, though differing in meaning”: man / woman boy / girl master / pupil “Contextually related, but with opposite meanings”: big / small clever / stupid

Uses of similarity Assumption: semantically similar words behave in similar ways Information retrieval: query expansion with related terms K nearest neighbours, e.g.: given: a set of elements, each assigned to some topic task: classify an unknown w by topic method: find the topic that is most prevalent among w’s semantic neighbours

Common approaches Vector-space approaches: represent word w as a vector containing the words (or other features) in the context of w compare the vectors of w1, w2 various vector-distance measures available Information-theoretic measures: w1 is similar to w2 to the extent that knowing about w1 increases my knowledge (decreases my uncertainty) about w2

Vector-space models

Basic data structure Matrix M M ij = no. of times w i co-occurs with w j (in some window). Can also have Document * word matrix We can treat matrix cells as boolean: if M ij > 0, then w i co-occurs with w j, else it does not.

Distance measures Many measures take a set-theoretic perspective: vectors can be: binary (indicate co-occurrence or not) real-valued (indicate frequency, or probability) similarity is a function of what two vectors have in common

Classic similarity/distance measures Boolean vector (sets)Real-valued vector Dice coefficient Jaccard Coefficient Dice coefficient Jaccard Coefficient

Dice (car, truck) On the boolean matrix: (2 * 2)/(4+2) = 0.66 Jaccard On the boolean matrix: 2/4 = 0.5 Dice is more “generous”; Jaccard penalises lack of overlap more. Dice vs. Jaccard

Classic similarity/distance measures Boolean vector (sets)Real-valued vector Cosine similarity (= angle between 2 vectors)

probabilistic approaches

Turning counts to probabilities P(spacewalking|cosmonaut) = ½ = 0.5 P(red|car) = ¼ = 0.25 NB: this transforms each row into a probability distribution corresponding to a word

Probabilistic measures of distance KL-Divergence: treat W1 as an approximation of W2 Problems: asymmetric: D(p||q) ≠ D(q||p) not so useful for word-word similarity if denominator = 0, then D(v||w) is undefined

Probabilistic measures of distance Information radius (aka Jenson-Shannon Divergence) compares total divergence between p and q to the average of p and q symmetric! Dagan et al (1997) showed this measure to be superior to KL- Divergence, when applied to a word sense disambiguation task.

Some characteristics of vector-space measures 1. Very simple conceptually; 2. Flexible: can represent similarity based on document co- occurrence, word co-occurrence etc; 3. Vectors can be arbitrarily large, representing wide context windows; 4. Can be expanded to take into account grammatical relations (e.g. head-modifier, verb-argument, etc).

Grammar-informed methods: Lin (1998) Intuition: The similarity of any two things (words, documents, people, plants) is a function of the information gained by having: a joint description of a and b in terms of what they have in common compared to describing a and b separately E.g. do we gain more by a joint description of: apple and chair (both THINGS…) apple and banana (both FRUIT: more specific)

Lin’s definition cont/d Essentially, we compare the info content of the “common” definition to the info content of the “separate” definition NB: essentially mutual information!

An application to corpora From a corpus-based point of view, what do words have in common? context, obviously How to define context? just “bag-of-words” (typical of vector-space models) more grammatically sophisticated

Kilgarriff’s (2003) application Definition of the notion of context, following Lin: define F(w) as the set of grammatical contexts in which w occurs a context is a triple : rel is a grammatical relation w is the word of interest w’ is the other word in rel Grammatical relations can be obtained using a dependency parser.

Grammatical co-occurrence matrix for cell Source: Jurafsky & Martin (2009), after Lin (1998)

Example with w = cell Example triples: Observe that each triple f consists of the relation r, the second word in the relation w’,..and the word of interest w We can now compute the level of association between the word w and each of its triples f: An information-theoretic measure that was proposed as a generalisation of the idea of pointwise mutual information.

Calculating similarity Given that we have grammatical triples for our words of interest, similarity of w1 and w2 is a function of: the triples they have in common the triples that are unique to each I.e.: mutual info of what the two words have in common, divided by sum of mutual info of what each word has

Sample results: master & pupil common: Subject-of: read, sit, know Modifier: good, form Possession: interest master only: Subject-of: ask Modifier: past (cf. past master) pupil only: Subject-of: make, find PP_at-p: school

Concrete implementation The online SketchEngine gives grammatical relations of words, plus thesaurus which rates words by similarity to a head word. This is based on the Lin 1998 model.

Limitations (or characteristics) Only applicable as a measure of similarity between words of the same category makes no sense to compare grammatical relations of different category words Does not distinguish between near-synonyms and “similar” words student ~ pupil master ~ pupil MI is sensitive to low-frequency: a relation which occurs only once in the corpus can come out as highly significant.