Semantic Role Labeling

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

The Structure of Sentences Asian 401
Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
SEMANTIC ROLE LABELING BY TAGGING SYNTACTIC CHUNKS
Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self.
Page 1 SRL via Generalized Inference Vasin Punyakanok, Dan Roth, Wen-tau Yih, Dav Zimak, Yuancheng Tu Department of Computer Science University of Illinois.
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
Modeling the Evolution of Product Entities Priya Radhakrishnan 1, Manish Gupta 1,2, Vasudeva Varma 1 1 Search and Information Extraction Lab, IIIT-Hyderabad,
Language and Cognition Colombo, June 2011 Day 2 Introduction to Linguistic Theory, Part 4.
Statistical NLP: Lecture 3
Chunk Parsing CS1573: AI Application Development, Spring 2003 (modified from Steven Bird’s notes)
Semantic Role Labeling Abdul-Lateef Yussiff
Probabilistic Parsing Chapter 14, Part 2 This slide set was adapted from J. Martin, R. Mihalcea, Rebecca Hwa, and Ray Mooney.
A Joint Model For Semantic Role Labeling Aria Haghighi, Kristina Toutanova, Christopher D. Manning Computer Science Department Stanford University.
Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
SRL using complete syntactic analysis Mihai Surdeanu and Jordi Turmo TALP Research Center Universitat Politècnica de Catalunya.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
Two-Phase Semantic Role Labeling based on Support Vector Machines Kyung-Mi Park Young-Sook Hwang Hae-Chang Rim NLP Lab. Korea Univ.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Shallow Parsing.
Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
CS224N Interactive Session Competitive Grammar Writing Chris Manning Sida, Rush, Ankur, Frank, Kai Sheng.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
March 2006 CLINT-CS 1 Introduction to Computational Linguistics Chunk Parsing.
Embedded Clauses in TAG
Parsing Long and Complex Natural Language Sentences
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
Syntax The study of how words are ordered and grouped together Key concept: constituent = a sequence of words that acts as a unit he the man the short.
Ling 570 Day 17: Named Entity Recognition Chunking.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
CS : Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with.
Methods for the Automatic Construction of Topic Maps Eric Freese, Senior Consultant ISOGEN International.
October 15, 2007 Non-finite clauses and control : Grammars and Lexicons Lori Levin.
AQUAINT Workshop – June 2003 Improved Semantic Role Parsing Kadri Hacioglu, Sameer Pradhan, Valerie Krugler, Steven Bethard, Ashley Thornton, Wayne Ward,
INSTITUTE OF COMPUTING TECHNOLOGY Forest-based Semantic Role Labeling Hao Xiong, Haitao Mi, Yang Liu and Qun Liu Institute of Computing Technology Academy.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
Linguistic Essentials
Conversion of Penn Treebank Data to Text. Penn TreeBank Project “A Bank of Linguistic Trees” (as of 11/1992) University of Pennsylvania, LINC Laboratory.
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
Rules, Movement, Ambiguity
Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)
NLP. Introduction to NLP The probabilities don’t depend on the specific words –E.g., give someone something (2 arguments) vs. see something (1 argument)
NLP. Introduction to NLP Background –From the early ‘90s –Developed at the University of Pennsylvania –(Marcus, Santorini, and Marcinkiewicz 1993) Size.
Supertagging CMSC Natural Language Processing January 31, 2006.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
NLP. Introduction to NLP Last week, Min broke the window with a hammer. The window was broken with a hammer by Min last week With a hammer, Min broke.
Handling Unlike Coordinated Phrases in TAG by Mixing Syntactic Category and Grammatical Function Carlos A. Prolo Faculdade de Informática – PUCRS CELSUL,
Chunk Parsing. Also called chunking, light parsing, or partial parsing. Method: Assign some additional structure to input over tagging Used when full.
NLP. Parsing ( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (,,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (,,) ) (VP (MD will) (VP (VB join) (NP (DT.
NLP. Introduction to NLP #include int main() { int n, reverse = 0; printf("Enter a number to reverse\n"); scanf("%d",&n); while (n != 0) { reverse =
Lec. 10.  In this section we explain which constituents of a sentence are minimally required, and why. We first provide an informal discussion and then.
Relation Extraction (RE) via Supervised Classification See: Jurafsky & Martin SLP book, Chapter 22 Exploring Various Knowledge in Relation Extraction.
Coping with Problems in Grammars Automatically Extracted from Treebanks Carlos A. Prolo Computer and Info. Science Dept. University of Pennsylvania.
Natural Language Processing Vasile Rus
COSC 6336: Natural Language Processing
CSC 594 Topics in AI – Natural Language Processing
Statistical NLP: Lecture 3
CS 388: Natural Language Processing: Syntactic Parsing
Probabilistic and Lexicalized Parsing
Chunk Parsing CS1573: AI Application Development, Spring 2003
Linguistic Essentials
Structure of a Lexicon Debasri Chakrabarti 13-May-19.
Progress report on Semantic Role Labeling
Presentation transcript:

Semantic Role Labeling Presented to LING-7800 Shumin Wu Prepared by Lee Becker and Shumin Wu

Task Given a sentence, From: To: Identify predicates and their arguments Automatically label them with semantic roles From: Mary slapped John with a frozen trout To: [AGENT Mary] [PREDICATE slapped] [PATIENT John] [INSTRUMENT with a frozen trout]

Argument Identification SRL Pipeline NP2 the park S NP1 He VP V Walked PP P in Syntactic Parse Prune Constituents Arguments NP1 VP V PP NP2 NP1 Yes VP No V given PP Yes NP2 No Argument Identification Argument Classification Structural Inference NP1 Agent V Predicate PP Location NP1 Agent/Patient V Predicate PP Location/Patient Semantic Roles Candidates

Pruning Algorithm [Xue, Palmer 2004] Goal: Reduce the overall number of constituents to label Reasoning: Save training time Step 1: Designate the predicate as the current node and collect its sisters unless the sister is coordinated with the predicate If a sister is a PP also collect its immediate children Step 2: Reset current node as the parent node Repeat Steps 1 and 2 until we’ve reached the top node

Pruning Algorithm [Xue, Palmer 2004] S S CC S and NP VP NP VP Strike and mismanagement VBD VP Premier Ryzhkov VBD PP were VBD warned Of tough measures cited

SRL Training Extract features from sentence, syntactic parse, and other sources for each candidate constituent Train statistical ML classifier to identify arguments Extract features same as or similar to those in step 1 Train statistical ML classifier to select appropriate label for arguments Multiclass All vs One

Training Data Need Gold Standard: Lexical Resources: Syntactic Parses (Constituent, Phrase-Based Dependency Based) Semantic Roles (Frame Elements, Arguments, etc) Lexical Resources: FrameNet (Baker et al, 1998) 49,000 annotated sentences from the BNC 99,232 annotated frame elements 1462 target words from 67 frames 927 verbs, 339 nouns, 175 adjectives PropBank (Palmer, Kingsbury, Gildea, 2005) Annotation over the Penn Treebank ??? Verb predicates Salsa (Erk, Kowalksi, Pinkal, 2003) Annotation over the German 1.5 million word Tiger corpus FrameNet Semantic roles Various Bakeoffs SemEval CoNLL

Feature Extraction The sentence and parses by themselves provide little useful information for selecting semantic role labels Need algorithms that derive features from these data that provide some clues about the relationship between the constituent and the sentence as a whole

Features: Phrase Type Intuition: Different roles tend to be realized by different syntactic categories FrameNet Communication_noise frame Speaker often is a noun phrase Topic typically a noun phrase or prepositional phrase Medium usually a prepositional phrase [SPEAKER The angry customer] yelled at the fast food worker [TOPIC about his soggy fries] [MEDIUM over the noisy intercom].

Commonly Used Features: Phrase Type Phrase Type indicates the syntactic category of the phrase expressing the semantic roles Syntactic categories from the Penn Treebank FrameNet distributions: NP (47%) – noun phrase PP (22%) – prepositional phrase ADVP (4%) – adverbial phrase PRT (2%) – particles (e.g. make something up) SBAR (2%), S (2%) - clauses

Commonly Used Features: Phrase Type VP NP PRP VBD NP SBAR IN S NP VP NNP VBD NP PP PRP IN NP NN He heard The sound of liquid slurping in a metal container as Farell approached him from behind Theme Target Goal Source

Commonly Used Features: Governing Category Intuition: There is often a link between semantic roles and their syntactic realization as subject or direct object He drove the car over the cliff Subject NP more likely to fill the agent role Grammatical functions may not be directly available in all parser representations

Commonly Used Features: Governing Category Approximating Grammatical Function from constituent parse Governing Category (aka gov) Two values S: subjects VP: object of verbs In practice used only on NPs

Commonly Used Features: Governing Category Algorithm Start with children of NP nodes Traverse links upward until it encounters an S or VP NPs under S nodes  subject NPs under VP nodes  object

Features: Governing Category SQ MD NP VP PRP VB NP PP DT NN IN S VP AUXG ADJP JJ can you blame the dealer for being late subject object null

Features: Governing Category Governing category does not perfectly discriminate grammatical function NP VP PRP VBD NP NP NN NN He left town yesterday subject indirect object adjunct

Features: Governing Category In this case indirect object and direct objects are both given governing category of VP NP VP PRP VBD NP PRP NP DT JJ NN He gave me a new hose subject Indirect object directobject

Features: Parse Tree Path Intuition: gov finds grammatical function independent of target word. Want something that factors in relation to the target word. Feature representation: String of symbols indicating the up and down traversal to go from the target word to the constituent of interest

Features: Parse Tree Path VP VB↑VP↑S↓NP NP PRP VB NP VB↑VP↓NP DT NN He ate some pancakes

Features: Parse Tree Path Frequency Path Description 14.2% VB↑VP↓PP PP argument/adjunct 11.8 VB↑VP↑S↓NP subject 10.1 VB↑VP↓NP object 7.9 VB↑VP↑VP↑S↓NP subject (embedded VP) 4.1 VB↑VP↓ADVP adverbial adjunct 3.0 NN↑NP↑NP↓PP prepositional complement of noun 1.7 VB↑VP↓PRT adverbial particle 1.6 VB↑VP↑VP↑VP↑S↓NP 14.2 no matching parse constituent 31.4 Other none

Features: Parse Tree Path Issues: Parser quality (error rate) Data sparseness 2978 possible values excluding frame elements with no matching parse constituent 4086 possible values including total Of 35,138 frame elements identifies as NP, only 4% have path feature without VP or S ancestor [Gildea and Jurafsky, 2002]

Features: Position Intuition: grammatical function is highly correlated with position in the sentence Subjects appear before a verb Objects appear after a verb Representation: Binary value – does node appear before or after the predicate Other motivations [Gildea and Jurafsky, 2002] Overcome errors due to incorrect parses Assess ability to perform SRL without parse trees

Features: Position Can you blame the dealer for being late? before after after

Features: Voice Intuition: Grammatical function varies with voice Direct objects in active  Subject in passive He slammed the door. The door was slammed by him. Approach: Use passive identifying patterns / templates Passive auxiliary (to be, to get) Past participle

Features: Subcategorization Intuition: Knowing the number of arguments to the verb changes the possible set of semantic roles S NP1 John VP V sold NP2 Mary NN book NP3 DT the Recipient Theme

Features: Head Word Intuition: Head words of noun phrases can be be indicative of selectional restrictions on the semantic types of role fillers. Noun Phrases headed by Bill, brother, or he more likely to be the Speaker Those headed by proposal, story, or question are more likely to be the Topic. Approach: Most parsers can mark the head word Can employ head words on a constituent parse tree to identify head words

Features: Head Words Head Rules – a way of deterministically identifying the head word for a phrase ADJP  NNS QP NN $ ADVP JJ VBN VBG ADJP JJR NP JJS DT FW RBR RBS SBAR RB ADVP RB RBR RBS FW ADVP TO CD JJR JJ IN NP JJS NN CONJP CC RB IN FRAG  (NN* | NP) W* SBAR (PP | IN) (ADJP | JJ) ADVP RB NP, NX (NN* | NX) JJR CD JJ JJS RB QP NP-e NP PP, WHPP (first non-punctuation after preposition) PRN (first non-punctuation) PRT RP S VP *-PRD S SBAR ADJP UCP NP VP VBD VBN MD VBZ VB VBG VBP VP *-PRD ADJP NN NNS NP Sample Head Percolation Rules [Johansson and Nugues]

Features: Argument Set Aka: Frame Element Group – set of all roles appearing for a verb in a given sentence Intuition: When deciding one role labels it’s useful to know their place in the set as a whole Representation: {Agent/Patient/Theme} {Speaker/Topic} Approach: Not used in training of the system, instead used after all roles are assign to re-rank role assignments for an entire sentence

Features: Argument Order [Fleischman, 2003] Description: An integer indicating the position of a constituent in the sequence of arguments Intuition: Role labels typically occur in a common order Advantages: independent of parser output, thus robust to parser error Can you blame the dealer for being late? 1 2 3

Features: Previous Role [Fleischman, 2003] Description: The label assigned by the system to the previous argument. Intuition: If we know what’s already been labeled we can better know what the current label should be. Approach: HMM-style Viterbi search to find best overall sequence

Features: Head Word Part of Speech [Surdeanu et al, 2003] Intuition: Penn Treebank POS labels differentiate singular/plural and proper/common nouns. This additional information helps refine the type of noun phrase for a role.

Features: Named entities in Constituents [Pradhan, 2005] Intuition: Knowing they type of the entity can allow for better generalization, since unlimited sets of proper names for people, organizations, and locations can make lead to data sparsity. Approach: Run a named entity recognizer on the sentences and use the entity label as a feature. Representation: Words are identified as a type of entity such as PERSON, ORGANIZATION, LOCATION, PERCENT, MONEY, TIME, and DATE.

Features: Verb Clustering Intuition: Semantically similar verbs undergo the same pattern of argument alternation [Levin, 1993] Representation: constituent is labeled with a verb class discovered in clustering He ate the cake. {verb_class = eat} He devoured his sandwich. {verb_class = eat} Approach: Perform automatic clustering of verbs based on direct objects ML Approaches: Expectation-Maximization K-means

Features: Head Word of PPs Intuition: While prepositions often indicate certain semantic roles (i.e. in, across, and toward = location, from = source), prepositions can be used in many different ways. We saw the play in New York = Location We saw the play in February = Time

Features: First/Last word/POS in constituent Intuition: Like with head word of PPs, we want more specific information about an argument than the headword alone. Advantages: More robust to parser error Applies to all types of constituents He was born in the final minutes of 2009 First Word/POS: He / PRN Last Word/POS: He / PRN First Word/POS: in/ IN Last Word/POS: 2009/ CD

Features: Constituent Order Intuition: Like argument order, but we want a way to differentiate constituents from non-constituents. Preference should go to constituents closer to the predicate.

Features: Constituent Tree Distance Description: the number of jumps necessary to get from the predicate to the constituent – like a path length Intuition: Like the Constituent Order, but factoring in syntactic structure

Features: Constituent Context Features Description: Information about the parent and left and right siblings of a constituent Intuition: Knowing a constituent’s place in the sentence helps determine the role.

Features: Constituent Context Features NP VP PRP VBD NP NP NN NN He left town yesterday Parent Phrase Type Parent Head Word Parent Head Word POS Left Sibling Phrase Type Left Sibling Head Word Left Sibling Head Word POS Right Sibling Phrase Type Right Sibling Head Word Right Sibling Head Word POS VP left VBD None NP yesterday NN

Features: Temporal Cue Words Intuition: Some words indicate time, but are not considered named entities by the named entity tagger. Approach: Words are matched in a gloss and included as binary features Moment Wink of an eye … Around the clock 1

Evaluation Precision – percentage of labels output by the system which are correct Recall – recall percentage of true labels correctly identified by the system F-measure, F_beta – harmonic mean of precision and recall

Evaluation Why all these measures? To keep us honest Together Precision and Recall capture the tradeoffs made in performing a classification task 100% precision is easy on a small subset of the data 100% recall is easy if everything is included Consider a doctor deciding whether or not to perform an appendectomy Can claim 100% precision if surgery is only performed on patients that have been administered a complete battery of tests. Can claim 100% recall if surgery is given to all patients

Evaluation Lots of choices when evaluating in SRL: Arguments Entire span Headword only Predicates Given System Identifies

Evaluation John mopped the floor with the dress Mary bought while studying and traveling in Thailand. Gold Standard Labels SRL Output Full Head Arg0: John + Rel: mopped Arg1: the floor Arg2: with the dress … Thailand Arg2: the dress - Arg0: Mary Rel: bought Arg1: the dress rel: studying Argm-LOC: in Thailand Rel: traveling Evaluated on Full Arg Span Precision P = 8 correct / 10 labeled = 80.0% Recall R = 8 correct / 13 possible = 61.5% F-Measure F = P x R = 49.2% Evaluated on Headword Arg Precision P = 9 correct / 10 labeled = 90.0% Recall R = 9 correct / 13 possible = 69.2% F-Measure F = P x R = 62.3%

Alternative Representations: Dependency Parse Dependency Parses provide much simpler graphs between the arguments

Dependency Parse dobj nsubj det He ate some pancakes

Alternative Representations: Syntactic Chunking [Hacioglu et al, 2005] Also known as partial parsing Classifier trained and used to identify BIO tags B: begin I: inside O: outside Sales declined 10% to $251.2 million from $278.7 million Sales declined % to million from million . B-NP B-VP I-NP B-PP I-NP B-PP I-NP

Alternative Representations: Syntactic Chunking [Hacioglu et al, 2005] Features Much overlap Distance distance of the token from the predicate as a number of base phrases same distance as the number of VP chunks Clause Position a binary feature that indicates the token is inside or outside of the clause which contains the predicate