Page 1 SRL via Generalized Inference Vasin Punyakanok, Dan Roth, Wen-tau Yih, Dav Zimak, Yuancheng Tu Department of Computer Science University of Illinois.

Slides:



Advertisements
Similar presentations
1 A Pipeline Model for Bottom-Up Dependency Parsing Ming-Wei Chang, Quang Do, Dan Roth Computer Science Department University of Illinois, Urbana-Champaign.
Advertisements

December 2011 NIPS Adaptation Workshop With thanks to: Collaborators: Ming-Wei Chang, Michael Connor, Gourab Kundu, Alla Rozovskaya Funding: NSF, MIAS-DHS,
The Practical Value of Statistics for Sentence Generation: The Perspective of the Nitrogen System Irene Langkilde-Geary.
CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng.
1 Automatic Semantic Role Labeling Scott Wen-tau Yih Kristina Toutanova Microsoft Research Thanks to.
A Structured Model for Joint Learning of Argument Roles and Predicate Senses Yotaro Watanabe Masayuki Asahara Yuji Matsumoto ACL 2010 Uppsala, Sweden July.
E XTRACTING SEMANTIC ROLE INFORMATION FROM UNSTRUCTURED TEXTS Diana Trandab ă 1 and Alexandru Trandab ă 2 1 Faculty of Computer Science, University “Al.
A Linear Programming Formulation for Global Inference in Natural Language Tasks Dan RothWen-tau Yih Department of Computer Science University of Illinois.
计算机科学与技术学院 Chinese Semantic Role Labeling with Dependency-driven Constituent Parse Tree Structure Hongling Wang, Bukang Wang Guodong Zhou NLP Lab, School.
Semantic Role Labeling Abdul-Lateef Yussiff
A Joint Model For Semantic Role Labeling Aria Haghighi, Kristina Toutanova, Christopher D. Manning Computer Science Department Stanford University.
Towards Parsing Unrestricted Text into PropBank Predicate- Argument Structures ACL4 Project NCLT Seminar Presentation, 7th June 2006 Conor Cafferkey.
Page 1 Learning and Global Inference for Information Access and Natural Language Understanding Dan Roth Department of Computer Science University of Illinois.
Page 1 SRL via Generalized Inference Vasin Punyakanok, Dan Roth, Wen-tau Yih, Dav Zimak, Yuancheng Tu Department of Computer Science University of Illinois.
SRL using complete syntactic analysis Mihai Surdeanu and Jordi Turmo TALP Research Center Universitat Politècnica de Catalunya.
Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging and Chunking with Maximum Entropy Model Sandipan Dandapat.
A Memory-Based Approach to Semantic Role Labeling Beata Kouchnir Tübingen University 05/07/04.
Maximum Entropy Model & Generalized Iterative Scaling Arindam Bose CS 621 – Artificial Intelligence 27 th August, 2007.
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
Page 1 February 2008 University of Edinburgh With thanks to: Collaborators: Ming-Wei Chang, Vasin Punyakanok, Lev Ratinov, Nick Rizzolo, Mark Sammons,
Global Inference in Learning for Natural Language Processing Vasin Punyakanok Department of Computer Science University of Illinois at Urbana-Champaign.
Page 1 November 2007 Beckman Institute With thanks to: Collaborators: Ming-Wei Chang, Vasin Punyakanok, Lev Ratinov, Mark Sammons, Scott Yih, Dav Zimak.
Page 1 March 2009 Brigham Young University With thanks to: Collaborators: Ming-Wei Chang, Vasin Punyakanok, Lev Ratinov, Nick Rizzolo, Mark Sammons, Scott.
PropBank, VerbNet & SemLink Edward Loper. PropBank 1M words of WSJ annotated with predicate- argument structures for verbs. –The location & type of each.
Overview of Machine Learning for NLP Tasks: part II Named Entity Tagging: A Phrase-Level NLP Task.
Aspect Guided Text Categorization with Unobserved Labels Dan Roth, Yuancheng Tu University of Illinois at Urbana-Champaign.
Logistic Regression Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata September 1, 2014.
June 2013 Inferning Workshop, ICML, Atlanta GA Amortized Integer Linear Programming Inference Dan Roth Department of Computer Science University of Illinois.
Penn 1 Kindle: Knowledge and Inference via Description Logics for Natural Language Dan Roth University of Illinois, Urbana-Champaign Martha Palmer University.
INSTITUTE OF COMPUTING TECHNOLOGY Forest-based Semantic Role Labeling Hao Xiong, Haitao Mi, Yang Liu and Qun Liu Institute of Computing Technology Academy.
Page 1 Learning and Inference in Natural Language From Stand Alone Learning Tasks to Structured Representations Dan Roth Department of Computer Science.
Page 1 Global Inference in Learning for Natural Language Processing.
Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.
CS 6998 NLP for the Web Columbia University 04/22/2010 Analyzing Wikipedia and Gold-Standard Corpora for NER Training William Y. Wang Computer Science.
Combining Lexical Resources: Mapping Between PropBank and VerbNet Edward Loper,Szu-ting Yi, Martha Palmer September 2006.
Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.
Prior Knowledge Driven Domain Adaptation Gourab Kundu, Ming-wei Chang, and Dan Roth Hyphenated compounds are tagged as NN. Example: H-ras Digit letter.
Inference Protocols for Coreference Resolution Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Nick Rizzolo, Mark Sammons, and Dan Roth This research.
Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
NLP. Introduction to NLP Last week, Min broke the window with a hammer. The window was broken with a hammer by Min last week With a hammer, Min broke.
HMM vs. Maximum Entropy for SU Detection Yang Liu 04/27/2004.
Global Inference via Linear Programming Formulation Presenter: Natalia Prytkova Tutor: Maximilian Dylla
A Fast Finite-state Relaxation Method for Enforcing Global Constraints on Sequence Decoding Roy Tromble & Jason Eisner Johns Hopkins University.
Page 1 April 2010 Carnegie Mellon University With thanks to: Collaborators: Ming-Wei Chang, James Clarke, Dan Goldwasser, Lev Ratinov, Vivek Srikumar,
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Page 1 July 2008 ICML Workshop on Prior Knowledge for Text and Language Constraints as Prior Knowledge Ming-Wei Chang, Lev Ratinov, Dan Roth Department.
Page 1 January 2010 Saarland University, Germany. Constrained Conditional Models Learning and Inference for Natural Language Understanding Dan Roth Department.
Julia Hockenmaier and Mark Steedman.   The currently best single-model statistical parser (Charniak, 1999) achieves Parseval scores of over 89% on the.
Page 1 June 2009 ILPNLP NAACL-HLT With thanks to: Collaborators: Ming-Wei Chang, Dan Goldwasser, Vasin Punyakanok, Lev Ratinov, Nick Rizzolo,
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
Semantic Role Labelling Using Chunk Sequences Ulrike Baldewein Katrin Erk Sebastian Padó Saarland University Saarbrücken Detlef Prescher Amsterdam University.
COSC 6336: Natural Language Processing
Inference and Learning via Integer Linear Programming
Coarse-grained Word Sense Disambiguation
Maximum Entropy Models and Feature Engineering CSCI-GA.2591
Integer Linear Programming Formulations in Natural Language Processing
Part 2 Applications of ILP Formulations in Natural Language Processing
By Dan Roth and Wen-tau Yih PowerPoint by: Reno Kriz CIS
CIS 700 Advanced Machine Learning for NLP Inference Applications
Improving a Pipeline Architecture for Shallow Discourse Parsing
Margin-based Decomposed Amortized Inference
Probabilistic and Lexicalized Parsing
CSCI 5832 Natural Language Processing
Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]
Dan Roth Computer and Information Science University of Pennsylvania
Dan Roth Department of Computer Science
Presentation transcript:

Page 1 SRL via Generalized Inference Vasin Punyakanok, Dan Roth, Wen-tau Yih, Dav Zimak, Yuancheng Tu Department of Computer Science University of Illinois at Urbana-Champaign

Page 2 Semantic Role Labeling For each verb in a sentence 1. identify all constituents that fill a semantic role 2. determine their roles Agent, Patient or Instrument, … Their adjuncts, e.g., Locative, Temporal or Manner PropBank project [Kingsbury & Palmer02] provides a large human-annotated corpus of semantic verb-argument relations. CoNLL-2004 shared task [Carreras & Marquez 04]

Page 3 Example  A0 represents the leaver,  A1 represents the thing left,  A2 represents the benefactor,  AM-LOC is an adjunct indicating the location of the action,  V determines the verb.

Page 4 Argument Types A0-A5 and AA have different semantics for each verb as specified in the PropBank Frame files. 13 types of adjuncts labeled as AM-XXX where XXX specifies the adjunct type. C-XXX is used to specify the continuity of the argument XXX. In some cases, the actual agent is labeled as the appropriate argument type, XXX, while the relative pronoun is instead labeled as R-XXX.

Page 5 Examples C-XXX R-XXX

Page 6 Outline Find potential argument candidates Classify arguments to types Inference for Argument Structure  Cost Function  Constraints  Integer linear programming (ILP) Results & Discussion

Page 7 Find Potential Arguments An argument can be any consecutive words I left my nice pearls to her [ [ [ [ [ ] ] ] ] ] Restrict potential arguments  BEGIN (word) BEGIN (word) = 1  “word begins argument”  END (word) END (word) = 1  “word ends argument” Argument  (w i,...,w j ) is a potential argument iff BEGIN (w i ) = 1 and END (w j ) = 1 Reduce set of potential arguments

Page 8 Details – Word-level Classifier BEGIN (word)  Learn a function  B (word,context,structure)  {0,1} END (word)  Learn a function  E (word,context,structure)  {0,1} P OT A RG = {arg | BEGIN (first(arg)) and END (last(arg))}

Page 9 Arguments Type Likelihood Assign type-likelihood  How likely is it that arg a is type t?  For all a  P OT A RG, t  T P (argument a = type t ) I left my nice pearls to her [ [ [ [ [ ] ] ] ] ] I left my nice pearls to her A0 C-A1A1Ø

Page 10 Details – Phrase-level Classifier Learn a classifier  ARGTYPE (arg)   P (arg)  {A0,A1,...,C-A0,...,AM-LOC,...}  argmax t  {A0,A1,...,C-A0,...,LOC,...} w t  P (arg) Estimate Probabilities  Softmax  P(a = t) = exp(w t  P (a)) / Z

Page 11 What is a Good Assignment? Likelihood of being correct  P(Arg a = Type t) if t is the correct type for argument a For a set of arguments a 1, a 2,..., a n  Expected number of arguments that are correct  i P( a i = t i ) We search for the assignment with the maximum expected number of correct arguments.

Page 12 Inference Maximize expected number correct  T* = argmax T  i P( a i = t i ) Subject to some constraints  Structural and Linguistic (R-A1  A1) I left my nice pearls to her Cost = = 1.6Non-OverlappingCost = = 1.4 Blue  Red & N-O Cost = = 1.8Independent Max

Page 13 LP Formulation – Linear Cost Cost function   a  P OT A RG P(a=t) =  a  P OT A RG, t  T P(a=t) x {a=t} Indicator variables x {a1= A0 }, x {a1= A1 }, …, x {a4= AM-LOC }, x {P4=  }  {0,1} Total Cost = p (a1= A0 ) · x (a1= A1 ) + p (a1=  ) · x (a1=  ) +… + p (a4=  ) · x (a4=  )

Page 14 Binary values  a  P OT A RG, t  T, x { a = t }  {0,1} Unique labels  a  P OT A RG,  t  T x { a = t } = 1 No overlapping or embedding a1 and a2 overlap  x {a1= Ø } + x {a2= Ø }  1 Linear Constraints (1/2)

Page 15 No duplicate argument classes  a  P OT A RG x { a = A0 }  1 R-XXX  a2  P OT A RG,  a  P OT A RG x { a = A0 }  x { a2 = R-A0 } C-XXX  a2  P OT A RG,  (a  P OT A RG )  (a is before a2 ) x { a = A0 }  x { a2 = C-A0 } Linear Constraints (2/2)

Page 16 Results on Perfect Boundaries PrecisionRecallF1F1 without inference with inference Assume the boundaries of arguments (in both training and testing) are given. Development Set

Page 17 Results Overall F 1 on Test Set : 66.39

Page 18 Discussion Data analysis is important !!  F 1 : ~45%  ~65% Feature engineering, parameter tuning, … Global inference helps !  Using all constraints gains more than 1% F 1 compared to just using non-overlapping constraints  Easy and fast: 15~20 minutes Performance difference ?  Not from word-based vs. chunk-based

Page 19 Thank you