Presentation is loading. Please wait.

Presentation is loading. Please wait.

Preposition Phrase Attachment To what previous verb or noun phrase does a prepositional phrase (PP) attach? The womanwith a poodle saw in the park with.

Similar presentations


Presentation on theme: "Preposition Phrase Attachment To what previous verb or noun phrase does a prepositional phrase (PP) attach? The womanwith a poodle saw in the park with."— Presentation transcript:

1 Preposition Phrase Attachment To what previous verb or noun phrase does a prepositional phrase (PP) attach? The womanwith a poodle saw in the park with a telescope on Tuesdayon his bicycle a manwith a poodle

2 A Simplified Version Assume ambiguity only between preceding base NP and preceding base VP: The woman had seen the man with the telescope. Q: Does the PP attach to the NP or the VP? Assumption: Consider only NP/VP head and the preposition

3 Simple Formulation Determine attachment based on log-likelihood ratio: LLR(v, n, p) = log P(p | v) - log P(p | n) If LLR > 0 then attach to verb, If LLR < 0 attach to noun

4 Issues Multiple attachment: –Attachment lines cannot cross Proximity: –Preference for attaching to closer structures, all else being equal Chrysler will end its troubled venture with Maserati. P(with | end) = 0.118 P(with | venture) =0.107 !!!

5 Hindle & Rooth (1993) Consider just sentences with a transitive verb and PP, i.e., of the form:... bVP bNP PP... Q: Where does the first PP attach (NP or VP)? Indicator variables (0 or 1): VA p : Is there a PP headed by p after v attached to v? NA p : Is there a PP headed by p after n attached to n? NB: Both variables can be 1 in a sentence

6 Attachment Probabilities P(attach(p) = n | v, n) = P(NA p =1 | n) –Verb attachment is irrelevant; if it attaches to the noun it cannot attach to the verb P(attach(p) = v | v, n) = P(VA p =1, NA p =0 | v, n) = P(VA p =1 | v) P(NA p =0 | n) –Noun attachment is relevant, since the noun ‘shadows’ the verb (by proximity principle)

7 Estimating Parameters MLE: P(VA p = 1 | v) = C(v,p) / C(v) P(NA p = 1 | n) = C(n,p) / C(n) Using an unlabeled corpus: –Bootstrap from unambiguous cases: The road from Chicago to New York is long. She went from Albany towards Buffalo.

8 Unsupervised Training 1.Build initial model using only unambiguous attachments 2.Apply initial model and assign attachments if LLR above threshhold 3.Divide remaining ambiguous cases as 0.5 counts for each possibility Use of EM as principled method?

9 Limitations Semantic issues: I examined the man with a stethoscope. I examined the man with a broken leg. Other contextual features: Superlative adjectives (biggest) indicate NP More complex sentences: The board approved its acquisition by BigCo of Milwaukee for $32 a share at its meeting on Tuesday.

10 Memory-Based Formulation Each example has four components: VN1PN2 examinemanwithstethoscope Class = V Similarity based on information gain weighting for matching components Need ‘semantic’ similarity measure for words: stethoscope ~ thermometerkidney ~ leg

11 MVDM Word Similarity Idea: Words are similar to the extent that they predict similar class distributions Data sparseness is a serious problem, though! Extend idea to task independent similarity metric...

12 Lexical Space Represent ‘semantics’ of a word by frequencies of words which coöccur with it, instead of relative frequencies of classes Each word has 4 vectors of frequencies for words 2 before, 1 before, 1 after, and 2 after INfor(0.05) since(0.10) at(0.11) after(0.11) under(0.11) GROUPnetwork(0.08) farm(0.11) measure(0.11) package(0.11) chain(0.11) club(0.11) bill(0.11) JAPANchina(0.16) france(0.16) britain(0.19) canada(0.19) mexico(0.19) india(0.19) australia(0.19) korea(0.22)

13 Results Baseline comparisons: –Humans (4-tuple):88.2% –Humans (full sentence):93.2% –Noun always:59.0% –Most likely for prep:72.2% Without Info Gain:83.7% With Info Gain:84.1%

14 Using Many Features Use many features of an example together Consider interaction between features during learning Each example represented as a feature vector: x = (f 1,f 2,...,f n )

15 Geometric Interpretation kNN Linear Separator Learning

16 Linear Separators Linear separator model is a vector of weights: w = (w 1,w 2,...,w n ) Binary classification: Is w T x > 0 ? –‘Positive’ and ‘Negative’ classes A threshhold other than 0 is possible by adding dummy element of “1” to all vectors – the threshhold is just the weight for that element

17 Error-Based Learning 1.Initialize w to be all 1’s 2.Cycle x through examples repeatedly (random order): If w T x > 0 but x is really negative, then decrease w’s elements If w T x < 0 but x is really positive, then decrease w’s elements

18 Winnow 1.Initialize w to be all 1’s 2.Cycle v through examples repeatedly (random order): b) If w T x > 0 but x is really negative, then: a) If w T x < 0 but x is really positive, then

19 Issues No negative weights possible! –Balanced Winnow: Formulate weights as sum of 2 weight vectors: w = w + - w - Learn each vector separately, w + regularly, and w - with polarity reversed Multiple classes: –Learn one weight vector for each class (learning X vs. not-X) –Choose highest value result for example

20 PP Attachment Features Words in each position Subsets of the above, e.g: Word classes at various levels of generality: stethoscope  medical instrument  instrument  device  instrumentation  artifact  object  physical thing –Derived from WordNet – handmade lexicon 15 basic features plus word-class features

21 Results Results without preposition of: BaseWord+1+5+10+15 58.177.477.279.178.578.6 Results including preposition of: 84.884.484.581.9 WinnowMBLBackoffTransform


Download ppt "Preposition Phrase Attachment To what previous verb or noun phrase does a prepositional phrase (PP) attach? The womanwith a poodle saw in the park with."

Similar presentations


Ads by Google