Download presentation
Presentation is loading. Please wait.
Published byArron Payne Modified over 8 years ago
1
11-1 Chapter 11 Lexical Acquisition
2
11-2 Lecture Overview Methodological Issues: Evaluation Measures Verb Subcategorization –the syntactic means by which verbs express their arguments Attachment Ambiguity –The children ate the cake with their hands –The children ate the cake with blue icing Selectional Preferences –The semantic categorization of a verb’s arguments Semantic Similarity (refer to IR course) –Semantic similarity between words
3
11-3 Lexicon That part of the grammar of a language which includes the lexical entries for all the words and/or morphemes in the language and which may also include various other information, depending on the particular theory of grammar + quantitative information, … + PP attachment, …
4
11-4 Evaluation Measures Precision and Recall F Measure Fallout Receiver Operating Characteristic (ROC) Curve –Show how different levels of fallout (false positives as a proportion of all non-targeted events) influence recall or sensitivity (true positives as a proportion of all targeted events)
5
11-5 tp: true positives tn: true negatives fp: false positives fn: false negatives (type II errors) (type I errors)
6
11-6 Precision and Recall versus Accuracy and Error Accuracy: tp+tn Error: fp+fn When tn is huge, it will dwarf all other numbers tpfpfntnPrecRecFAcc 25012599,8501.0000.1670.2860.9988 50100 99,7500.333 0.9980 751507599,7000.3330.5000.4000.9978 1252252599,6250.3570.8330.5000.9975 150275099,5750.3531.0000.5220.9973
7
11-7 Verb Subcategorization I Verbs express their semantic categories using different syntactic means. A particular set of syntactic categories that a verb can appear with is called a subcategorization frame.
8
11-8 Verb Subcategorization I Most dictionaries doe not contain information on subcategorization frame. (Brent, 93)’s subcategorization frame learner tries to decide based on corpus evidence whether verb v takes frame f. It works in 2 steps.
9
11-9 Verb Subcategorization II Brent’s Lerner system: Cues: Define a regular pattern of words and syntactic categories which indicates the presence of the frame with high certainty. For a particular cue c j we define a probability of error j that indicates how likely we are to make a mistake if we assign frame f to verb v based on cue c j. Hypothesis Testing: Define the null hypothesis, H 0, as: “the frame is not appropriate for the verb”. Reject this hypothesis if the cue c j indicates with high probability that our H 0 is wrong.
10
11-10 Example Cues –regular pattern for subcategorization frame “NP NP” –[…] greet-V Peter-Cap,-PUNC […] (o) –I came Thursday, before the storm started. ( ) Null hypothesis testing –Verb v i occurs a total of n times in the corpus and there are m n occurrences with a cue for frame f j. –Reject the null hypothesis H 0 that v i does not permit f j with the following probability of error (OBJ | SUBJ_OBJ | CAP) (PUNC |CC) Verb v i does not permit frame f j. # of times that v i occurs with cue c j error rate for cue f j e.g., me, him, … e.g., it, you, … any capitalized words
11
11-11 Verb Subcategorization III Brent’s system does well at precision, but not well at recall. (Manning, 93)’s system addresses this problem by using a tagger and running the cue detection on the output of the tagger. Manning’s method can learn a large number of subcategorization frames, even those that have only low-reliability cues. Manning’s results are still low and one way to improve them is to use prior knowledge.
12
11-12 Sue bought a plant with Jane. Sue bought a plant with yellow leaves. Syntactic information is insufficient. Simple Methods for Prepositional Phrases PP-attachment problem verb np 1 (prep np 2 ) ? PCFG prefers parses that use common constructions attachment decision
13
11-13 Assumption: P(A|prep, verb, np 1, np 2, ) P(A|prep, verb, np 1, np 2 ) : words in the text outside of “verb np 1 (prep np 2 )” A: random variable representing attachment decision V(A): verb or np 1
14
11-14 Counter example. “...”, “...”, “…” “Fred saw a movie with Arnold Schwarzenegger.” Further assumption (simplification) P(A|prep, verb, np 1, np 2, noun 1, noun 2 ) P(A|prep, verb, noun 1, noun 2 ) total parameters: 10 13 #(prep) #(verb) #(noun) #(noun)= head of np1 (np2)
15
11-15 Further simplification: statistics on its propensity to attach to verb versus its propensity to attach to noun P(A= noun | prep, verb, noun 1 ) vs. P(A= verb | prep, verb, noun 1 ) total parameters #prep #noun2 #noun1= Alternative to reduce parameters (1) Condition probabilities on fewer things. (2) Condition probabilities on more general things.
16
11-16 Attachment Ambiguity The preference bias for low attachment in the parse tree is formalized by (Hindle and Rooth, 1993) The model asks the following questions: VA p : Is there a PP headed by p and following the verb v which attaches to v (VA p =1) or not (VA p =0)? NA p : Is there a PP headed by p and following the noun n which attaches to n (NA p =1) or not (NA p =0)?
17
11-17 (1)Determine the attachment of a PP that is immediately following an object noun, i.e. compute the probability of NA p =1. (2)In order for the first PP headed by the preposition p to attach to the verb, both Va p =1 and Na p =0 Assume conditional independence of the two attachments. Assume whether verb (noun) is modified by PP is independent of noun (verb).
18
11-18 Likelihood ratio : (1)Verb attachment: large positive value of (2)Noun attachment: large negative value of (3)undecidable: is closer to zero Maximum estimation of P(Va p =1|v) and P(Na p =1|n) where C(v) and C(n) are # of occurrences of v and n C(v,p) and C(n,p) are # of times that p attaches to v and p attaches to n.
19
11-19 Estimation of PP attachment counts 1. If a noun is followed by a PP but no preceding verb, increment C(prep attached to noun). Sure Noun Attach E.G. noun in subject position or preverbal position 2.if a passive verb is followed by a PP other than a “by” phrase, increment C(prep attached to verb). Sure Verb Attach 1 E.G. the dog was hit on the leg. 3.if a PP follows both a noun phrase and a verb but the noun phrase is a pronoun, increment C(prep attached to verb). E.G. Sue saw him in the park. Sure Verb Attach 2 4.if a PP follows both a noun and a verb, see if the probabilities based on the attachment decided by 1-3 greatly favor one or the other attachment. (e.g., >2.0 VA, <-2.0 NA) 5.otherwise increment both attachment counters by 0.5. Ambiguous Attach 2 Ambiguous Attach 1
20
11-20 example Moscow sent more than 100,000 soldiers into Afghanistan … (attachment to the verb is much more likely)
21
11-21 ChooseChoosePercent NounVerbCorrect Choose Noun889064.0 2 human judges55133885.7 Program56535478.1 Program > 95%40720185.0 Human > 95%41619286.9 919? 889 608 No significant difference Sparse data is a major cause of the difference between the human judges’ performance and that of the program
22
11-22 Using Semantic Information condition on semantic tags of verb & noun. Example. Artist, Jane, plumber, Ted (human) Friday, June, yesterday (time) hammer, leaves, pot (object) Sue bought a plant with Jane. human Sue bought a plant with yellow leaves. object attach vp attach np
23
11-23 General Remarks on PP Attachment There are some limitations to the method by Hindle and Rooth: Sometimes information other than v, n and p is useful. There are other types of PP attachment than the basic case of a PP immediately after an NP object. There are other types of attachments altogether: N N N or V N P. The Hindle and Rooth formalism is more difficult to apply in these cases because of data sparsity. In certain cases, there is attachment indeterminacy.
24
11-24 Relative -clause Attachment unrestrictive relative clauses (1) Fred awarded a prize to the dog and Bill, who trained it. (2) Fred awarded a prize to Sue and Fran, who sang a great song. (3) Fred awarded a prize for penmanship that was worth $50.50 (4) Fred awarded a prize for the dog that ran the fastest. restrictive relative clause
25
11-25 To Sue and Fran who sang a great song np vp s np pp
26
11-26 Determine the referent of relative pronoun = Determine the noun phrase to which it is attached. The dog that e ate the cookie (subject) The dog that sue bought e (object) The dog that Fred gave the biscuit to e (np of pp) Where does the head noun fit into the relative clause? Sue ate a cookie that was baked e by Fran. Assumption The noun phrase severs as the subject of the relative clause. (Fisher & Riloff)
27
11-27 (1) collect “ subject-verb” and “verb-object” pairs. (training part) (2) compute t-score (testing part) t-score > 0.10 (significant) possible attachment points x y z … P (relative clause attaches to x | main verb of clause =v) > P (relative clause attaches to y | main verb of clause=v) P (x= subject/object | v) > P (y= subject/ object)
28
11-28 1.If the probabilities for all attachments were not significant, then attachment was left unresolved. 2.If there was at least one significant score, the highest was chosen. 3.If there was a tie, then the rightmost attachment was chosen.
29
11-29 Small training data (semantic tag) (Fixed topic & Semantic tag) (General & Semantic tag)
30
11-30 Uniform use of Lexical / Semantic Information Generalize to other ambiguities noun-noun, adjective-noun song bird feeder kit metal bird feeder kit np Song bird feeder kit np metal bird feeder kit np
31
11-31 Selectional Preferences I Most verbs prefer arguments of a particular type Such regularities are called selectional preferences or selectional restrictions. –eat object (food item) –think subject (people) –bark subject (dog) Selectional preferences are useful for a couple of reasons: –If a word is missing from our machine-readable dictionary, aspects of its meaning can be inferred from selectional restrictions. Susan had never eaten a fresh durian before. (food item) –Selectional preferences can be used to rank different parses of a sentence.
32
11-32 Selectional Preferences II Resnik (1993, 1996)’s idea for Selectional Preferences uses the notions of selectional preference strength and selectional association. Selectional Preference strength, S(v) measures how strongly the verb constrains its direct object. where P(C) is the overall probability distribution of noun classes P(C|v) is the probability distribution of noun classes in the direct object position of v. verb subject, verb direct object, verb prepositional phrase adjective noun, noun noun head noun
33
11-33 Selectional Preferences II S(v) is defined as the KL divergence between the prior distribution of direct objects (for verbs in general) and the distribution of direct objects of the verb we are trying to characterize. We make 2 assumptions in this model: 1) only the head noun of the object is considered; 2) rather than dealing with individual nouns, we look at classes of nouns.
34
11-34 Selectional Preferences III The Selectional Association between a verb and a class is defined as the proportion that this classe’s contribution to S(v) contributes to the overall preference strength S(v).
35
11-35 Selectional Preferences III There is also a rule for assigning association strengths to nouns as opposed to noun classes. If a noun is in a single class, then its association strength is that of its class. If it belongs to several classes, then its association strength is that of the class it belongs to that has the highest association strength. Finally, there is a rule for estimating the probability that a direct object in noun class c occurs given a verb v.
36
11-36 Susan interrupted the chair. chair: furniture vs. people (i.e., chairperson) A(interrupt, people) >> A(interrupt, furniture) A(interrupt, chair) = = A(interrupt, people)
37
11-37 (1)eat prefers food item. very specific A(eat, food)=1.08 (2)see has a uniform distribution. no selectional preference A(see, people)=A(see, furniture)=A(see, food)=A(see, action)=0 (3)find disprefers action item. less specific A(find, action)=-0.13
38
11-38 typical objects atypical objects
39
11-39 Semantic Similarity I Text Understanding or Information Retrieval could benefit much from a system able to acquire meaning. Meaning acquisition is not possible at this point, so people focus on assessing semantic similarity between a new word and other already known words. Semantic similarity is not as intuitive and clear a notion as we may first think: synonymy? Same semantic domain? Contextual interchangeability? Vector Space versus Probabilistic Measures
40
11-40 Semantic Similarity II: Vector Space Measures Words can be expressed in different spaces: document space, word space and modifier space. Similarity measures for binary vectors: matching coefficient, Dice coefficient, Jaccard (or Tanimoto) coefficient, Overlap coefficient and cosine. Similarity measures for the real-valued vector space: cosine, Euclidean Distance, normalized correlation coefficient
41
11-41 Semantic Similarity II: Probabilistic Measures The problem with vector space based measures is that, aside from the cosine, they operate on binary data. The cosine, on the other hand, assumes a Euclidean space which is not well-motivated when dealing with word counts. A better way of viewing word counts is by representing them as probability distributions. Then we can compare two probability distributions using the following measures: KL Divergence, Information Radius (Irad) and L 1 Norm.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.