Download presentation
Presentation is loading. Please wait.
1
1/17 Acquiring Selectional Preferences from Untagged Text for Prepositional Phrase Attachment Disambiguation Hiram Calvo and Alexander Gelbukh Presented by Igor A. Bolshakov Center for Computing Research National Polytechnic Institute Mexico
2
2/17 Introduction Entities must be identified adequately for database representation: –See the cat with a telescope –See [the cat] [with a telescope] 2 entities –See [the cat with a telescope] 1 entity Problem is known as Prepositional Phrase (PP) attachment disambiguation.
3
3/17 Existing methods - 1 Accuracy when using treebank statistics: –Ratnaparkhi et al., Brill and Resnik: up to 84% –Kudo and Matsumoto: 95.8% Needed weeks for training –Lüdtke and Sato: 94.9% Only 3 hours for training But there are no treebanks for many languages!
4
4/17 Existing methods - 2 Based on Untagged text: –Calvo and Gelbukh, 2003: 82.3% accuracy –Uses the web as corpus: Slow (up to 18 queries for each PP attachment ambiguity) Does this method work with very big local corpora?
5
5/17 Using a big local corpus Corpus –3 years of publication of 4 newspapers –161 million words –61 million sentences Results: –Recall: 36% Precision: 67% –Dissapointing!
6
6/17 What do we want? To solve PP attachment disambiguation with –Local corpora, not web –No treebanks –No supervision –High precision and recall Solution proposed: –Selectional Preferences
7
7/17 Selectional Preferences The problem of I see a cat with a telescope turns into I see {animal} with {instrument}
8
8/17 Sources for noun semantic classification Machine-Readable dictionaries WordNet ontology –We use the top 25 unique beginner concepts of WordNet Examples: mouse is-a {animal}, ranch is-a {place}, root is-a part}, reality is-a {atrtibute}, race is-a {grouping}, etc.
9
9/17 Extracting Selectional Preferences Text is shallow parsed Subordinate sentences are separated Patterns are searched 1. Verb NEAR Preposition NEXT_TO Noun 2. Verb NEAR Noun 3. Noun NEAR Verb 4. Noun NEXT_TO Preposition NEXT_TO Noun All Noun s are classified
10
10/17 Example Consider this toy-corpus : –I see a cat with a telescope –I see a ship in the sea with a spyglass The following patterns are extracted: –See,catsee,{animal} –See,with,telescopesee,with,{instrument} –Cat,with,telescope {animal},with,{instrument} –See,shipsee,{thing} –See,in,seasee,in,{place} –See,with,spyglasssee,with,{instrument} –Ship,in,sea{thing},in,{place}
11
11/17 Example See, with, {instrument} has two occurrences {Animal}, with, {instrument} has one occurrence Thus, –See with {instrument} is more probable than {animal} with {instrument}
12
12/17 Experiment Now, with a real corpus, we apply the following formula: X can be a specific verb or a noun’s semantic class (see or {animal}) P is a preposition (with) C 2 is the class of the second noun {instrument}
13
13/17 Experiment From the corpus of 161 million words of Spanish Mexican newspaper the system obtained: 893,278 selectional preferences for 5,387 verbs, and 55,469 noun patterns (like {animal} with {instrument})
14
14/17 Evaluation We tested the obtained Selectional Preferences doing PP attachment disambiguation on 546 sentences from the LEXESP corpus (in Spanish). Then we compared manually with the correct PP attachments. Results: precision 78.2%, recall: 76.0%
15
15/17 Conclusions Results not as good as those obtained by other methods (up to 95%) But we don’t need any costly resources, such as: –Treebanks –Manually anotated corpora –Web as corpus
16
16/17 Future Work To use not only 25 fixed semantic classes (top concepts) but the whole hierarchy To use a WSD module –Currently if a word belongs to more than one class, all classes are taken into accoutb
17
17/17 Thank you! hiram@sagitario.cic.ipn.mx gelbukh@cic.ipn.mx
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.