Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lexical Acquisition Extending our information about words, particularly quantitative information.

Similar presentations


Presentation on theme: "Lexical Acquisition Extending our information about words, particularly quantitative information."— Presentation transcript:

1 Lexical Acquisition Extending our information about words, particularly quantitative information

2 Why lexical acquisition? “one cannot learn a new language by reading a bilingual dictionary” -- Mercer –Parsing ‘postmen’ requires context quantitative information is difficult to collect by hand –e.g., priors on word senses productivity of language –Lexicons need to be updated for new words and usages

3 Machine-readable Lexicons contain... Lexical vs syntactic information √Word senses –Classifications, subclassifications √Collocations –Arguments, preferences –Synonyms, antonyms –Quantitative information

4 Gray area between lexical and syntactic The rules of grammar are syntactic. –S ::= NP V NP –S ::= NP [V NP PP] But which one to use, when? –The children ate the cake with their hands. –The children ate the cake with blue icing.

5 Outline of chapter verb subcategorization –Which arguments (e.g. infinitive, DO) does a particular verb admit? attachment ambiguity –What does the modifier refer to? selectional preferences –Does a verb tend to restrict its object to a certain class? semantic similarity between words –This new word is most like which words?

6 Verb subcategorization frames Assign to each verb the sf’s legal for it. (see diagram) Crucial for parsing. –She told the man where Peter grew up. (NP NP S) –She found the place where Peter grew up. (NP NP)

7 Brent’s method (1993) Learn subcategorizations given a corpus, lexical analyzer, and cues. A cue is a pair : –L is a star-free regular expression over lexemes (OBJ | SUBJ-OBJ | CAP) (PUNC | CC) –SF is a subcategorization frame NP Strategy: find verb sf’s for which the cues provide strong evidence.

8 Brent’s method (cont’d) Compute the error rate of the cue E = Pr(false positives) For each verb v and cue c =, Test the hypothesis H 0 that verb v does not admit SF. –p E = If p E < a threshold, reject H 0.

9 Subcategorization Frames: Ideas Hypothesis testing gives high precision, low recall. Unreliable cues are necessary and helpful (independence assumption) Find SF’s for verb classes, rather than verbs, using a buggy tagger. As long as error estimates are incorporated into p E, it works great. Manning did this, and improved recall.

10 Attachment Ambiguity: PPs NP V NP PP -- Does PP mdify V or NP? Assumption: there is only one meaningful parse for each sentence: xThe children ate the cake with a spoon. √Bush sent 100,000 soldiers into Kuwait. √Brazil honored their deal with the IMF. Straw man: compare co-occurrence counts between pairs and.

11 Bias defeats simple counting Prob(into | send) > Prob(into | soldiers). Sometimes there will be strong association between PP and both V and NP. –Ford ended its venture with Fiat. In this case, there is a bias toward “low attachment” -- attaching PP to the nearer referent, NP.

12 Hindle and Ruth (1993) Elegant (?) method of quantifying the low attachment bias Express P(first PP after object attaches to object) and P(first PP after object attaches to verb) as a function of P(NA) = P(there is a PP following the object attaching to object) and P(VA) = P(there is a PP following the object attaching to verb) Estimate P(NA) and P(VA) based on counting

13 Estimating P(NA) and P(VA) are a particular verb, noun, and preposition P(VA p | v) = –(# times p attaches to v)/(# occs of v) P(NA p | n) = –(# times p attaches to n)/(# occs of v) The two are treated as independent!

14 Attachment of first PP P(Attach(p,n) | v,n) = P(NA p | n) –Whenever there is a PP attaching to the noun, the first such PP attaches to the noun! P(Attach(p,v) | v,n) = P((not NA p ) | n) P(VA p | v) –Whenever there is no PP attaching to the noun, AND a PP attaching to verb… –I (put the [book on the table) on WW2]

15 Selectional Preferences Verbs prefer classes of subjects, objects: –Objects of ‘eat’ tend to be food items –Subjects of ‘think’ tend to be people –Subjects of ‘bark’ tend to be dogs Used to –disambiguate word sense –infer class of new words –rank multiple parses

16 Disambiguate the class (Resnick) –She interrupted the chair. A(nc) = D(P(nc | v) || P(nc)) = P(nc|v)log(P(nc|v)/P(nc)) Relative entropy, or Kullback Leibler distance A(furniture) = P(furniture | interrupted) * log((P(furniture | interrupted) / P(furniture))

17 Estimating P(nc | v) P(nc | v) = P(nc,v) / P(v) P(v) is estimated to be the proportion of occurrences v among all verbs P(nc,v) is proposed to be –1/N Σ (n in nc) C(v,n)/|classes(n)| Now just take the class with highest A(nc) for maximum likelihood word sense.

18 Semantic similarity Uses –classifying a new word –expand queries in IR Are two words similar... –When they are used together? IMF and Brazil –When they are on the same topic? astronaut and spacewalking –When they function interchangeably? Soviet and American –When they are synonymous? astronaut and cosmonaut

19 Cosine is no panacea Corresponds to Euclidean distance between points Should document-space vectors be treated as points? Alternative: treat them as probability distributions (after normalizing) Now, no reason to use cosine. Why not try information-theoretic approach?

20 Alternatives distance metrics to cosine Cosine of square roots (Goldszmidt) L1 norm -- Manhattan distance –Sum of absolute value of difference of components KL Distance –D(p || q) Mutual information (why not?) –D(p ^ q || pq) Information radius -- information lost describing both p and q by their midpoint. –IRAD(p,q) = D(p||m) + D(q||m)


Download ppt "Lexical Acquisition Extending our information about words, particularly quantitative information."

Similar presentations


Ads by Google