I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara Rosario

2 Lexical acquisition Develop algorithms and statistical techniques for filling the holes in existing dictionaries and lexical resources by looking at the occurrences of patterns of words in large text corpora –Collocations –Semantic similarity –Logical metonymy –Selectional preferences

3 The limits of hand-encoded lexical resources Manual construction of lexical resources is very costly Because language keeps changing, these resources have to be continuously updated Quantitative information (e.g., frequencies, counts) has to be computed automatically anyway

4 The coverage problem From CS 224N / Ling 280, Stanford, Manning

5 Lexical acquisition Examples: –“insulin” and “progesterone” are in WordNet 2.1 but “leptin” and “pregnenolone” are not. –“HTML” and “SGML”, but not “XML” or “XHTML”. –“Google” and “Yahoo”, but not “Microsoft” or “IBM”. We need some notion of word similarity to know where to locate a new word in a lexical resource

6 Lexical acquisition Lexical acquisition problems –Collocations –Semantic similarity –Logical metonymy –Selectional preferences

7 Collocations A collocation is an expression consisting of two or more words that correspond to some conventional way of saying things –Noun phrases: weapons of mass destruction, stiff breeze (but why not *stiff wind?) –Verbal phrases: to make up –Not necessarily contiguous: knock…. door Limited compositionality –Compositional if meaning of expression can be predicted by the meaning of the parts –Idioms are most extreme examples of non-compositionality Kick the bucket –In collocations there is an element of meaning added to the combination (i.e. the exact meaning cannot be derived directly form its components) White hair, white wine, white woman

8 Collocations Non Substitutability –Cannot substitute words in a collocation *yellow wine Non modifiability –To get a frog in one’s throat *To get an ugly frog in one’s throat Useful for –Language generation *Powerful tea, *take a decision –Machine translation Easy way to test if a combination is a collocation is to translate it into another language –Make a decision  *faire une decision (prendre), *fare una decisione (prendere)

9 Subclasses of collocations Light verbs –Make a decision, do a favor Phrasal verbs –To tell off, make up Proper names –San Francisco, New York Terminological expressions –Hydraulic oil filter This is compositional, but need to make sure, for example that it’s always translated the same

10 Finding collocations Frequency –If two words occur together a lot, that may be evidence that they have a special function –But if we sort by frequency of pairs C(w 1, w 2 ), then “of the” is the most frequent pair –Filter by POS patterns –A N (linear function), N N (regression coefficients) etc.. Mean and variance of the distance of the words For not contiguous collocations –She knocked at his door (d = 2) –A man knocked on the metal front door (d = 4) –Hypothesis testing (see page 162 Stat NLP) How do we know it’s really a collocation? Low mean distance can be accidental (new company) We need to know whether two words occur together by chance or not (because they are a collocation) –Hypothesis testing

11 Finding collocations Mutual information measure –A measure of how much a word tells us about the other, i.e. the reduction in uncertainty of one word due to knowing about another 0 when the two words are independent (see Stat NLP page 66 and178)

13 Lexical and semantic similarity Lexical and distributional notions of meaning similarity How can we work out how similar in meaning words are? What is it useful for? –IR –Generalization Semantically similar words behave similarly –QA, inference… We could use anything in the thesaurus –Meronymy –Example sentences/definitions –In practice, by “thesaurus-based” we usually just mean using the is- a/subsumption/hypernym hierarchy Word similarity versus word relatedness –Similar words are near-synonyms –Related could be related any way Car, gasoline: related, not similar Doctor nurse fever: related (topic) Car, bicycle: similar

14 Semantic similarity Similar if contextually interchangeable –The degree for which one word can be substituted for another in a given context Suit similar to litigation (but only in the legal context) Measures of similarity –WordNet-based –Vector-based –Detecting hyponymy and other relations

15 WordNet: Semantic Similarity Whale is very specific (and baleen whale even more so), while vertebrate is more general and entity is completely general. We can quantify this concept of generality by looking up the depth of each synset:

16 WordNet: Semantic Similarity Path_similarity: Two words are similar if nearby in thesaurus hierarchy (i.e. short path between them) –path_similarity assigns a score in the range 0–1 based on the shortest path that connects the concepts in the hypernym hierarchy The numbers don’t mean much, but they decrease as we move away from the semantic space of sea creatures to inanimate objects.

17 WordNet: Path Similarity From CS 224N / Ling 280, Stanford, Manning

18 WordNet: Path Similarity Problem with path similarity –Assumes each link represents a uniform distance –Instead: –Want a metric which lets us represent the cost of each edge independently –There have been a whole slew of methods that augment thesaurus with notions from a corpus (Resnik, Lin, …) From CS 224N / Ling 280, Stanford, Manning

19 Vector-based lexical semantics Very old idea: the meaning of a word can be specified in terms of the values of certain `features’ (`COMPONENTIAL SEMANTICS’) –dog : ANIMATE= +, EAT=MEAT, SOCIAL=+ –horse : ANIMATE= +, EAT=GRASS, SOCIAL=+ –cat : ANIMATE= +, EAT=MEAT, SOCIAL=- Similarity / relatedness: proximity in feature space From CS 224N / Ling 280, Stanford, Manning

20 Vector-based lexical semantics From CS 224N / Ling 280, Stanford, Manning

21 General characterization of vector- based semantics Vectors as models of concepts The CLUSTERING approach to lexical semantics: 1.Define properties one cares about, and give values to each property (generally, numerical) 2.Create a vector of length n for each item to be classified 3.Viewing the n-dimensional vector as a point in n-space, cluster points that are near one another What changes between models: 1.The properties used in the vector 2.The distance metric used to decide if two points are `close’ 3.The algorithm used to cluster From CS 224N / Ling 280, Stanford, Manning

22 Distributional Similarity: Using words as features in a vector-based semantics The old decompositional semantic approach requires –i. Specifying the features –ii. Characterizing the value of these features for each lexeme Simpler approach: use as features the WORDS that occur in the proximity of that word / lexical entry –Intuition: “You shall know a word by the company it keeps.” (J. R. Firth) More specifically, you can use as `values’ of these features –The FREQUENCIES with which these words occur near the words whose meaning we are defining –Or perhaps the PROBABILITIES that these words occur next to each other Some psychological results support this view. From CS 224N / Ling 280, Stanford, Manning

23 Using neighboring words to specify the meaning of words Take, e.g., the following corpus: –John ate a banana. –John ate an apple. –John drove a lorry. We can extract the following co-occurrence matrix:

24 Acquiring lexical vectors from a corpus To construct vectors C(w) for each word w: 1.Scan a text 2.Whenever a word w is encountered, increment all cells of C(w) corresponding to the words v that occur in the vicinity of w, typically within a window of fixed size Differences among methods: –Size of window –Weighted or not –Whether every word in the vocabulary counts as a dimension (including function words such as the or and) or whether instead only some specially chosen words are used (typically, the m most common content words in the corpus; or perhaps modifiers only). –The words chosen as dimensions are often called CONTEXT WORDS –(Whether dimensionality reduction methods are applied) From CS 224N / Ling 280, Stanford, Manning

25 Variant: using only modifiers to specify the meaning of words From CS 224N / Ling 280, Stanford, Manning

26 The CLUSTERING approach to lexical semantics –Create a vector of length n for each item to be classified Viewing the n-dimensional vector as a point in n- space, cluster points that are near one another –Define a similarity measure (the distance metric used to decide if two points are `close’) For example: –(Eventually) clustering algorithm From CS 224N / Ling 280, Stanford, Manning

27 The HAL model Burges and Lund (95, 98) –A 160 million words corpus of articles extracted from all newsgroups containing English dialogue –Context words: the 70,000 most frequently occurring symbols within the corpus –Window size: 10 words to the left and the right of the word –Measure of similarity: cosine –Frightened: scared, upset, shy, embarrassed, anxious, worried, afraid –Harmed: abused, forced, treated, discriminated, allowed, attracted, taught –Beatles: original, band, song, movie, album, songs, lyrics, British From CS 224N / Ling 280, Stanford, Manning

28 Latent Semantic Analysis Landauer at al (97, 98) –Goal: extract expected contextual usage from passages –Steps: Build a word / document co-occurrence matrix `Weight’ each cell (e.g., tf.idf) Perform a DIMENSIONALITY REDUCTION –Argued to correlate well with humans on a number of tests From CS 224N / Ling 280, Stanford, Manning

29 Detecting Hyponymy and other relations with patterns Goal: discover new hyponyms, and add them to a taxonomy under the appropriate hypernym –Agar is a substance prepared from a mixture of red algae, such as Gelidium, for laboratory or industrial use. –What does Gelidium mean? How do you know?

30 Hearst approach Hearst hand-built patterns: From CS 224N / Ling 280, Stanford, Manning

31 Trained algorithm to discover patterns Snow, Jurafsky, Ng (05) Collect noun pairs from corpora –(752,311 pairs from 6 million words of newswire) Identify each pair as positive or negative example of hypernym/hyponym relationship –(14,387 yes, 737,924 no) Parse the sentences, extract patterns (lexical and parses-paths) Train a hypernym classifier on these patterns From CS 224N / Ling 280, Stanford, Manning

32 From CS 224N / Ling 280, Stanford, Manning

33 Evaluation: precision and recall Precision can be seen as a measure of exactness or fidelity, whereas Recall is a measure of completeness. Used in information retrieval –A perfect Precision score of 1.0 means that every result retrieved by a search was relevant (but says nothing about whether all relevant documents were retrieved) whereas a perfect Recall score of 1.0 means that all relevant documents were retrieved by the search (but says nothing about how many irrelevant documents were also retrieved). –Precision is defined as the number of relevant documents retrieved by a search divided by the total number of documents retrieved –Recall is defined as the number of relevant documents retrieved by a search divided by the total number of existing relevant documents (which should have been retrieved).

34 Evaluation: precision and recall Classification context A perfect Precision score of 1.0 for a class C means that every item labeled as belonging to class C does indeed belong to class C (but says nothing about the number of items from class C that were not labeled correctly) A perfect Recall of 1.0 means that every item from class C was labeled as belonging to class C (but says nothing about how many other items were incorrectly also labeled as belonging to class C).

35 Precision and recall: trade-off Often, there is an inverse relationship between Precision and Recall, where it is possible to increase one at the cost of reducing the other. For example, an search engine can increase its Recall by retrieving more documents, at the cost of increasing number of irrelevant documents retrieved (decreasing Precision). Similarly, a classification system for deciding whether or not, say, a fruit is an orange, can achieve high Precision by only classifying fruits with the exact right shape and color as oranges, but at the cost of low Recall due to the number of false negatives from oranges that did not quite match the specification.

38 Other lexical semantics tasks Metonymy is a figure of speech in which a thing or concept is not called by its own name, but by the name of something intimately associated with that thing or concept. –Examples: Logical Metonymy –enjoy the book means enjoy reading the book, and easy problem means a problem that is early to solve.

39 Other lexical semantics tasks From CS 224N / Ling 280, Stanford, Manning

43 Selectional preferences Most verbs prefer arguments of a particular type: selectional preferences or restrictions –Objects of eat tend to be food, subjects of think tend to be people etc.. –“Preferences” to allow for metaphors Feat eats the soul Why is it important for NLP?

44 Selectional preferences Why Important? –To infer meaning from selectional restrictions Suppose we don’t know the words durian (not in the vocabulary) Susan ate a very fresh durian Infer that durian is a type of food –Ranking the possible parses of a sentence Give higher scores to parses where the verbs has ‘natural argument”

45 Model of selectional preferences Resnick, 93 (see page 288 Stat NLP) Two main concepts 1.Selectional preference strength –How strongly the verb constrains its direct object Eat, find, see 2.Selectional association between the verb and the object semantic class Eat and food The higher 1 and 2 the less important is to have an object (i.e. the more likely is to have the implicit object construction) Bo ate, but *Bo saw

46 Next class Next time: review Classification Project ideas (likely on October 6) Two more assignments (most likely) Project proposals (1-2 pages description) Projects

I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

Similar presentations

Presentation on theme: "I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

Similar presentations

Presentation on theme: "I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara."— Presentation transcript:

Similar presentations

About project

Feedback