Presentation is loading. Please wait.

Presentation is loading. Please wait.

1/17 Acquiring Selectional Preferences from Untagged Text for Prepositional Phrase Attachment Disambiguation Hiram Calvo and Alexander Gelbukh Presented.

Similar presentations


Presentation on theme: "1/17 Acquiring Selectional Preferences from Untagged Text for Prepositional Phrase Attachment Disambiguation Hiram Calvo and Alexander Gelbukh Presented."— Presentation transcript:

1 1/17 Acquiring Selectional Preferences from Untagged Text for Prepositional Phrase Attachment Disambiguation Hiram Calvo and Alexander Gelbukh Presented by Igor A. Bolshakov Center for Computing Research National Polytechnic Institute Mexico

2 2/17 Introduction Entities must be identified adequately for database representation: –See the cat with a telescope –See [the cat] [with a telescope] 2 entities –See [the cat with a telescope] 1 entity Problem is known as Prepositional Phrase (PP) attachment disambiguation.

3 3/17 Existing methods - 1 Accuracy when using treebank statistics: –Ratnaparkhi et al., Brill and Resnik: up to 84% –Kudo and Matsumoto: 95.8% Needed weeks for training –Lüdtke and Sato: 94.9% Only 3 hours for training But there are no treebanks for many languages!

4 4/17 Existing methods - 2 Based on Untagged text: –Calvo and Gelbukh, 2003: 82.3% accuracy –Uses the web as corpus: Slow (up to 18 queries for each PP attachment ambiguity) Does this method work with very big local corpora?

5 5/17 Using a big local corpus Corpus –3 years of publication of 4 newspapers –161 million words –61 million sentences Results: –Recall: 36% Precision: 67% –Dissapointing!

6 6/17 What do we want? To solve PP attachment disambiguation with –Local corpora, not web –No treebanks –No supervision –High precision and recall Solution proposed: –Selectional Preferences

7 7/17 Selectional Preferences The problem of I see a cat with a telescope turns into I see {animal} with {instrument}

8 8/17 Sources for noun semantic classification Machine-Readable dictionaries WordNet ontology –We use the top 25 unique beginner concepts of WordNet Examples: mouse is-a {animal}, ranch is-a {place}, root is-a part}, reality is-a {atrtibute}, race is-a {grouping}, etc.

9 9/17 Extracting Selectional Preferences Text is shallow parsed Subordinate sentences are separated Patterns are searched 1. Verb NEAR Preposition NEXT_TO Noun 2. Verb NEAR Noun 3. Noun NEAR Verb 4. Noun NEXT_TO Preposition NEXT_TO Noun All Noun s are classified

10 10/17 Example Consider this toy-corpus : –I see a cat with a telescope –I see a ship in the sea with a spyglass The following patterns are extracted: –See,catsee,{animal} –See,with,telescopesee,with,{instrument} –Cat,with,telescope {animal},with,{instrument} –See,shipsee,{thing} –See,in,seasee,in,{place} –See,with,spyglasssee,with,{instrument} –Ship,in,sea{thing},in,{place}

11 11/17 Example See, with, {instrument} has two occurrences {Animal}, with, {instrument} has one occurrence Thus, –See with {instrument} is more probable than {animal} with {instrument}

12 12/17 Experiment Now, with a real corpus, we apply the following formula: X can be a specific verb or a noun’s semantic class (see or {animal}) P is a preposition (with) C 2 is the class of the second noun {instrument}

13 13/17 Experiment From the corpus of 161 million words of Spanish Mexican newspaper the system obtained: 893,278 selectional preferences for 5,387 verbs, and 55,469 noun patterns (like {animal} with {instrument})

14 14/17 Evaluation We tested the obtained Selectional Preferences doing PP attachment disambiguation on 546 sentences from the LEXESP corpus (in Spanish). Then we compared manually with the correct PP attachments. Results: precision 78.2%, recall: 76.0%

15 15/17 Conclusions Results not as good as those obtained by other methods (up to 95%) But we don’t need any costly resources, such as: –Treebanks –Manually anotated corpora –Web as corpus

16 16/17 Future Work To use not only 25 fixed semantic classes (top concepts) but the whole hierarchy To use a WSD module –Currently if a word belongs to more than one class, all classes are taken into accoutb

17 17/17 Thank you! hiram@sagitario.cic.ipn.mx gelbukh@cic.ipn.mx


Download ppt "1/17 Acquiring Selectional Preferences from Untagged Text for Prepositional Phrase Attachment Disambiguation Hiram Calvo and Alexander Gelbukh Presented."

Similar presentations


Ads by Google