Acquiring Reliable Predicate- argument Structures from Raw Corpora for Case Frame Compilation Daisuke Kawahara 1 and Sadao Kurohashi 1,2 LREC2010, 2010/05/20.

Slides:



Advertisements
Similar presentations
Automatic Methods to Supplement Broad-Coverage Subcategorization Lexicons Michael Schiehlen, Kristina Spranger Institut für Maschinelle Sprachverarbeitung.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
LABELING TURKISH NEWS STORIES WITH CRF Prof. Dr. Eşref Adalı ISTANBUL TECHNICAL UNIVERSITY COMPUTER ENGINEERING 1.
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
Grammatical Relations and Lexical Functional Grammar Grammar Formalisms Spring Term 2004.
Overview of the Hindi-Urdu Treebank Fei Xia University of Washington 7/23/2011.
Explanation Producing Combination of NLP and Logical Reasoning through Translation of Text to KR Formalisms CHITTA BARAL ARIZONA STATE UNIVERSITY 1 School.
Semantic Role Labeling Abdul-Lateef Yussiff
Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.
1 Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners Howard Chen Department of English National Taiwan Normal University.
LING NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006.
Steven Schoonover.  What is VerbNet?  Levin Classification  In-depth look at VerbNet  Evolution of VerbNet  What is FrameNet?  Applications.
Introduction to treebanks Session 1: 7/08/
1 Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
Automatic Classification of Semantic Relations between Facts and Opinions Koji Murakami, Eric Nichols, Junta Mizuno, Yotaro Watanabe, Hayato Goto, Megumi.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004.
Part of speech (POS) tagging
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
1/17 Acquiring Selectional Preferences from Untagged Text for Prepositional Phrase Attachment Disambiguation Hiram Calvo and Alexander Gelbukh Presented.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
ELN – Natural Language Processing Giuseppe Attardi
Automatic Extraction of Opinion Propositions and their Holders Steven Bethard, Hong Yu, Ashley Thornton, Vasileios Hatzivassiloglou and Dan Jurafsky Department.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
A Feedback-Augmented Method for Detecting Errors in the Writing of Learners of English Ryo Nagata et al. Hyogo University of Teacher Education ACL 2006.
A Survey of NLP Toolkits Jing Jiang Mar 8, /08/20072 Outline WordNet Statistics-based phrases POS taggers Parsers Chunkers (syntax-based phrases)
A Method for Automatically Constructing Case Frames for English Daisuke Kawahara and Kiyotaka Uchimoto (LREC2008, 2008/05/29) National Institute of Information.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
Language Knowledge Engineering Lab. Kyoto University NTCIR-10 PatentMT, Japan, Jun , 2013 Description of KYOTO EBMT System in PatentMT at NTCIR-10.
Ling 570 Day 17: Named Entity Recognition Chunking.
On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute.
1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa
Semantic Role Labeling for Arabic using Kernel Methods Mona Diab Alessandro Moschitti Daniele Pighin.
Improving Subcategorization Acquisition using Word Sense Disambiguation Anna Korhonen and Judith Preiss University of Cambridge, Computer Laboratory 15.
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and.
AQUAINT Workshop – June 2003 Improved Semantic Role Parsing Kadri Hacioglu, Sameer Pradhan, Valerie Krugler, Steven Bethard, Ashley Thornton, Wayne Ward,
INSTITUTE OF COMPUTING TECHNOLOGY Forest-based Semantic Role Labeling Hao Xiong, Haitao Mi, Yang Liu and Qun Liu Institute of Computing Technology Academy.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
By: Hannah Gettings.  Definition of pronoun: a word used in place of a noun.  Example: She gave him the book. *say for example the names of the people.
An Entity-Mention Model for Coreference Resolution with Inductive Logic Programming Xiaofeng Yang 1 Jian Su 1 Jun Lang 2 Chew Lim Tan 3 Ting Liu 2 Sheng.
Natural Language Programming David Vadas The University of Sydney Supervisor: James Curran.
NLP. Introduction to NLP Motivation –A lot of the work is repeated –Caching intermediate results improves the complexity Dynamic programming –Building.
Supertagging CMSC Natural Language Processing January 31, 2006.
Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 5 th.
Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.
1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
NLP. Introduction to NLP Last week, Min broke the window with a hammer. The window was broken with a hammer by Min last week With a hammer, Min broke.
Handling Unlike Coordinated Phrases in TAG by Mixing Syntactic Category and Grammatical Function Carlos A. Prolo Faculdade de Informática – PUCRS CELSUL,
SALSA-WS 09/05 Approximating Textual Entailment with LFG and FrameNet Frames Aljoscha Burchardt, Anette Frank Computational Linguistics Department Saarland.
Towards Semi-Automated Annotation for Prepositional Phrase Attachment Sara Rosenthal William J. Lipovsky Kathleen McKeown Kapil Thadani Jacob Andreas Columbia.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Chapter 8 Lexical Acquisition February 19, 2007 Additional Notes to Manning’s slides.
Correcting Misuse of Verb Forms John Lee, Stephanie Seneff Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge ACL 2008.
A classifier-based approach to preposition and determiner error correction in L2 English Rachele De Felice, Stephen G. Pulman Oxford University Computing.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
Background & Overview Proposed Model Experimental Results Future Work
Machine Learning in Natural Language Processing
CS 388: Natural Language Processing: Syntactic Parsing
LING/C SC 581: Advanced Computational Linguistics
Donna M. Gates Carnegie Mellon University
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
CS224N Section 3: Corpora, etc.
Presentation transcript:

Acquiring Reliable Predicate- argument Structures from Raw Corpora for Case Frame Compilation Daisuke Kawahara 1 and Sadao Kurohashi 1,2 LREC2010, 2010/05/20 1 National Institute of Information and Communications Technology 2 Kyoto University

2 Background NLP analyzers so far –(Mainly) supervised, (relatively) knowledge-poor e.g., PP-attachment or parsing Mary ate the salad with a fork Mary ate the salad with mushrooms –Only 1.5% of bilexical dependency was learned [Bikel, 04]  Toward knowledge-oriented NLP –Automatically compile case frames and integrate them into NLP analyzers/applications

3 Related work Subcategorization frames –[Brent, 93] [Ushioda et al., 93] [Manning, 93] [Briscoe and Carroll, 97] [Korhonen, 02] … e.g., She greeted me. NP(sbj) greet NP(obj) e.g., She gave him a book. NP(sbj) give NP(obj) NP(obj)

4 Related work Subcategorization frames –[Brent, 93] [Ushioda et al., 93] [Manning, 93] [Briscoe and Carroll, 97] [Korhonen, 02] … (Manually compiled) semantic frames –FrameNet [Baker et al., 98], PropBank [Palmer et al., 05] Japanese semantic case frames –Semantic marker-based: [Haruno, 95] [Utsuro et al., 96] –Example-based: [Kawahara and Kurohashi, 06]

5 CSexamples (in English) yaku (1) (bake) gaI:18, person:15, craftsman:10, … wobread:2484, meat:1521, cake:1283, … deoven:1630, frying pan:1311, … yaku (2) (have difficulty) gateacher:3, government:3, person:3, … wohand:2950 niattack:18, action:15, son:15, … yaku (3) (burn) gacompany:1, distributor:1, … wodata:178, file:107, copy:9, … niR:1583, CD:664, CDR:3, … … ga: nominative, wo: accusative, ni: dative, de: instrument Compilation of Japanese semantic case frames [Kawahara and Kurohashi, 06]

6 Case frames Compilation of English case frames Sentences Parsing and filtering Predicate-argument structures Clustering WordNet Dependency parser 89.9% → 91.5% (short sentences) [Kawahara and Uchimoto, 08] Sentence extraction

7 Examples of obtained case frames CSexamples burn (1)sbjthey:262, it:113, protester:99, … objflag:247, effigy:81, house:67, … pp:in :29, ramallah:14, brisbane:11, … pp:forweek:15, hour:6, month:5, … burn (2)sbjcandle:26, lamp:5 pp:onmotor-scooter:7, altar:3, platform:1, … pp:forday:2, steinhaeuser:1 … [Kawahara and Uchimoto, 08] surface cases and prepositions sbj, obj, obj2, sbar, pp:for, pp:in, …

8 Case frames Compilation of English case frames Sentences Parsing and filtering Predicate-argument structures Clustering WordNet Sentence extraction Dependency parser 89.9% → 91.5% (short sentences)

NP:[I] VP:[borrowed] NP:[the kits] PP:[with] NP:[a $ deposit] O:, O:and … Procedure 1.Apply POS tagging and chunking to a raw corpus 2.Filter out unreliable and inappropriate sentences and chunks 3.Extract predicate-argument structures and apply PP-attachment disambiguation if a PP exists I borrowed the kits with a $25.00 deposit, and … Example: NP:[I] VP:[borrowed] NP:[the kits] PP:[with] NP:[a $ deposit] sbj:[I] pred:[borrow] obj:[the kits] pp:with:[a $ deposit]

POS tagging –Tsuruoka’s tagger [Tsuruoka and Tsujii, 05] accuracy: 97.1% Chunking –YamCha chunker [Kudo and Matsumoto, 01] precision: 93.89%, recall: 93.06%, F: POS tagging and chunking 10

2. Filtering of unreliable sentences and chunks sentences to be discarded –a sentence that begins with a VP or a PP –a sentence that ends with a question mark –a sentence that has a comma being adjacent to a VP –a sentence that contains a sign (-, ;, …) –a sentence that does not have an NP before a VP –a sentence in which the first VP is a participle or an infinitive chunks to be discarded –chunks following the first comma outside an NP –chunks following wh-clauses –chunks following the second VP except participles and infinitives 11 Coverage: 17.9%

Evaluation of filtering results VP –precision: 96.46% (517/536) –12/19 are not harmful e.g., “ successfully contended ” → precision: 98.69% (529/536) NP –precision: 96.18% (1559/1621) –38/62 are not harmful e.g., “ about 10,000 diamond miners ” → precision: 98.52% (1597/1621) 12 His firm favors selected computer, drug and pollution-control stocks.

3. Extract predicate-argument structures from chunks Use straightforward rules –VP → pred –NP preceding the predicate → sbj –NP following the predicate → obj –NP following “obj” → obj2 –SBAR → sbar –a pair of adjoining PP and NP → pp 13

From 2G English sentences, we acquired 2.4G predicate-argument structures Manual evaluation of 200 predicate- argument structures: 97% is correct –incorrect objects of say, know and so on –incorrect detection of “sbar” –Errors of PP-attachment disambiguation sbj:[the super-user] pred:[raise] obj:[the hard limits] sbj:[it] pred:[strengthen] obj:[the action] sbj:[he] pred:[raise] obj:[a hand] sbj:[this web page] pred:[be linked] pp:to:[any other web sites] sbj:[a user] pred:[view] obj:[items] pp:from:[your catalog] sbj:[you] pred:[read] obj:[this] Experiments 14 He said the assets to be sold would be...

15 Conclusion and future work Acquired high-quality predicate-argument structures for case frame compilation –Real use of English predicates Future work –Apply clustering to compile case frames [Kawahara and Uchimoto, 08] –Integrate case frames to parsing (and other applications) cf.[Zeman, 02] for subcategorization frames [Kawahara and Kurohashi, 06] for case frames