Page 1 Starting from Scratch in Semantic Role Labeling Michael Connor, Yael Gertner, Cynthia Fisher, Dan Roth.

Slides:

Advertisements

Similar presentations

Latent Variables Naman Agarwal Michael Nute May 1, 2013.

Advertisements

CHAPTER 2 THE NATURE OF LEARNER LANGUAGE

Chapter 7 Hypothesis Testing

18 and 24-month-olds use syntactic knowledge of functional categories for determining meaning and reference Yarden Kedar Marianella Casasola Barbara Lust.

Psych 156A/ Ling 150: Acquisition of Language II Lecture 12 Poverty of the Stimulus I.

Theeraporn Ratitamkul, University of Illinois and Adele E. Goldberg, Princeton University Introduction How do young children learn verb meanings? Scene.

A Joint Model For Semantic Role Labeling Aria Haghighi, Kristina Toutanova, Christopher D. Manning Computer Science Department Stanford University.

LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From EMNLP.

The Nature of Learner Language

PSY 369: Psycholinguistics Language Acquisition: Learning words, syntax, and more.

Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.

Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.

Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.

LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Distributional Cues to Word Boundaries: Context Is Important Sharon Goldwater Stanford University Tom Griffiths UC Berkeley Mark Johnson Microsoft Research/

Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.

1 Human simulations of vocabulary learning Présentation Interface Syntaxe-Psycholinguistique Y-Lan BOUREAU Gillette, Gleitman, Gleitman, Lederer.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Learning to Identify Overlapping and Hidden Cognitive Processes from fMRI Data Rebecca Hutchinson, Tom Mitchell, Indra Rustandi Carnegie Mellon University.

Introduction to Machine Learning Approach Lecture 5.

Introduction to machine learning

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.

The origin of syntactic bootstrapping: A computational model Michael Connor, Cynthia Fisher, & Dan Roth University of Illinois at Urbana-Champaign.

Robert Hass CIS 630 April 14, 2010 NP NP↓ Super NP tagging JJ ↓

Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.

Machine Learning CUNY Graduate Center Lecture 1: Introduction.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

1 Statistical NLP: Lecture 10 Lexical Acquisition.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.

Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.

TEMPLATE DESIGN © Learning Words and Rules Abstract Knowledge of Word Order in Early Sentence Comprehension Yael Gertner.

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)

Psycholinguistic Theory

Adele E. Goldberg. How argument structure constructions are learned.

Mapping words to actions and events: How do 18-month-olds learn a verb? Mandy J. Maguire, Elizabeth A. Hennon, Kathy Hirsh-Pasek, Roberta M. Golinkoff,

A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:

CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.

Machine Learning Tutorial Amit Gruber The Hebrew University of Jerusalem.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

STRUCTURAL LIMITS ON VERB MAPPING The role of abstract structure in 2.5 year olds’ interpretations of novels verbs Article by : Cynthia Fisher,University.

Cognitive Processes PSY 334 Chapter 11 – Language Structure June 2, 2003.

Psych 56L/ Ling 51: Acquisition of Language Lecture 11 Lexical Development III.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:

Evaluating Models of Computation and Storage in Human Sentence Processing Thang Luong CogACLL 2015 Tim J. O’Donnell & Noah D. Goodman.

11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.

FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.

Data Mining and Decision Support

Unit 2 The Nature of Learner Language 1. Errors and errors analysis 2. Developmental patterns 3. Variability in learner language.

Part of Speech Tagging in Context month day, year Alex Cheng Ling 575 Winter 08 Michele Banko, Robert Moore.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.

NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.

Chapter 11 Language. Some Questions to Consider How do we understand individual words, and how are words combined to create sentences? How can we understand.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Machine Learning Usman Roshan Dept. of Computer Science NJIT.

Natural Language Processing Vasile Rus

Child Syntax and Morphology

PSYC 206 Lifespan Development Bilge Yagmurlu.

ACL Workshop on Cognitive Aspects of Computational Language Learning

CSC 594 Topics in AI – Natural Language Processing

THE NATURE of LEARNER LANGUAGE

Statistical NLP: Lecture 9

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.

Statistical NLP : Lecture 9 Word Sense Disambiguation

Presentation transcript:

Page 1 Starting from Scratch in Semantic Role Labeling Michael Connor, Yael Gertner, Cynthia Fisher, Dan Roth

Page 2 Topid rivvo den marplox. How do we acquire language?

Page 3 “the language” “the world” [Topid rivvo den marplox.] The language-world mapping problem

Page 4 Smur! Rivvo della frowler. Topid rivvo den marplox. Marplox dorinda blicket. Blert dor marplox, arno. Scene 1 Scene 3 Scene n Observe how words are distributed across situations

Page 5 Children can learn the meanings of some nouns via cross- situational observation alone [ Fisher 1996, Gillette, Gleitman, Gleitman, & Lederer, 1999; Snedeker & Gleitman, 2005] But how do they learn the meaning of verbs?  Sentences comprehension is grounded by the acquisition of an initial set of concrete nouns  These nouns yields a skeletal sentence structure — candidate arguments; cue to its semantic predicate—argument structure.  Represent sentence in an abstract form that permits generalization to new verbs [Johanna rivvo den sheep.] Structure-mapping: A proposed starting point for syntactic bootstrapping Nouns identified

Page 6 Strong Predictions [Gertner & Fisher, 2006]  Test 21 month olds on assigning arguments with novel verbs  How order of nouns influences interpretation: Transitive & Intransitive Transitive: The boy is daxing the girl! Agent-first: The boy and the girl are daxing! Agent-last: The girl and the boy are daxing! Error disappears by 25 months preferential looking paradigm

Page 7 Current Project: BabySRL Realistic Computational model for Syntactic Bootstrapping via Structure Mapping:  Verbs meanings are learned via their syntactic argument-taking roles  Semantic feedback to improve syntactic & meaning representation Develop Semantic Role Labeling System (BabySRL) to experiment with theories of early language acquisition  SRL as minimal level language understanding  Determine who does what to whom. Inputs and knowledge sources  Only those we can defend children have access to

Page 8 BabySRL: Key Components Representation:  Theoretically motivated representation of the input  Shallow, abstract, sentence representation consisting of # of nouns in the sentence Noun Patterns (1 st of two nouns) Relative position of nouns and predicates Learning:  Guided by knowledge kids have Classify words by part-of-speech Identify arguments and predicates Determine the role arguments take

Page 9 BabySRL: Early Results [Connor et. al CoNLL’08, 09] Fine grained experiments with how language is represented Test different levels of representation Primary focus on noun pattern (NPattern) feature  Hypothesis: number and order of nouns important Once we know some nouns, can use them to represent structure  NPattern gives count and placement: First of two, second of three, etc. Alternative: Verb Position  Target argument is before or after verb Key Finding:  NPattern reproduces errors in children Promotes A0-A1 interpretation in transitive, but also intransitive sentences  Verb position does not make this error Incorporating it recovers correct interpretation But: Done with manually labeled data  Feedback varies

Page 10 BabySRL: Key Components Representation:  Theoretically motivated representation of the input  Shallow, abstract, sentence representation consisting of # of nouns in the sentence Noun Patterns (1 st of two nouns) Relative position of nouns and predicates Learning:  Guided only by knowledge kids have Classify words by part-of-speech Identify arguments and predicates Determine the role arguments take

Page 11 This work: Minimally Supervised BabySRL Goal: Unsupervised “parsing” for identifying arguments Provide little prior knowledge & high level semantic feedback  Defensible from psycholinguistic evidence Overview Unsupervised Parsing  Identifying part-of-speech states Argument Identification  Identify Argument States  Identify Predicate States Argument Role Classification  Labeled Training using unsupervised arguments Results and comparison to child experiments

Page 12 BabySRL Overview Traditional Approach  Parse input  Identify Arguments  Classify Arguments  Global inference over Arguments  Each stage has its own classifier/knowledge source Finely labeled training data throughout She always has salad. [NP][ VP ][ NP ] [ ][ V ][ ] [ A0] [ A1 ]

Page 13 BabySRL Overview Unsupervised Approach  Unsupervised HMM  Identify Argument States  Classify Arguments  No Global inference  Labeled training: only for argument classifier Rest is driven by simple background knowledge She always has salad N V N A0 A1

Page 14 Unsupervised Parsing We want to generate a representation that permits generalization over word forms  Incorporate Distributional Similarity  Context Sensitive Hidden Markov Model (HMM)  Simple model  Essentially provides Part of Speech information Without names for states; we need to figure this out Train on child directed speech  CHILDES repository  Around 1 million words, across multiple children

Page 15 Unsupervised Parsing (II) Standard way to train unsupervised HMM  Simple EM produces uniform size clusters  Solution: Include priors for sparsity Dirichlet prior (Variational Bayes, VB) Replace this by psycholinguistically plausible knowledge Knowledge of function words  Function and content words have different statistics  Evidence that even newborns can make this distinction We don't use prosody, but it may provide this. Technically: allocate a number of states to function words  Leave the rest to the rest of the words  Done before parameter estimation, can be combined with EM or VB learning: EM+Func, VB+Func

Page 16 Unsupervised Parsing Evaluation Test as unsupervised POS on subset of hand corrected CHILDES data. Incorporating function word pre-clustering allows both EM & VB to achieve the same performance with an order of magnitude fewer sentences EM: HMM trained with EM VB: HMM trained with Variational Bayes & Dirichlet Prior EM+Funct VB+Funct: Different training methods with function word pre-clustering Variance of Information Training Sentences Better

Page 17 Argument Identification Now we have a parser that gives us state ( cluster) each word belongs to Next: identify states that correspond to arguments & predicates Knowledge: We provide list of frequent nouns  As few as 10 nouns covers 60% occurrences Mostly pronouns  A lot of evidence that children know and recognize nouns early on Algorithm: Those states that appear over half the time with known nouns treated as argument state  Assumes that nouns = arguments

Page 18 She46 {it he she who Fraser Sarah Daddy Eve...} always48 {just go never better always only even...} has26 {have like did has...} salad74 {it you what them him me her something...}.2 {. ? !} She46 always48 has26 salad74.2 She always has salad. Argument Identification Knowledge: Frequent Nouns: You, it, I, what, he, me, ya, she, we, her, him, who, Ursula, Daddy, Fraser, baby, something, head, chair, lunch,… List of words that occurred with state 46 in CHILDES

Page 19 Predicate Identification Nouns are concrete, can be identified  Predicates more difficult  Not learned easily via cross-situational observation Structure-mapping account: sentence comprehension is grounded in the learning of an initial set of nouns Verbs are identified based on their argument-taking behavior. Algorithm: Identify predicates as those states that tend to appear with a given number of arguments  Assume one predicate per sentence

Page 20 Predicate Identification She46 always48 has26 salad74.2 State 48: 1 argument argument argument – State 26: 1 argument argument argument –

Page 21 Argument Identification Results Test compared to hand labeled argument boundaries on CHILDES directed speech Vary number of seed nouns Argument Identification Predicate Identification Good Bad Implications on features’ quality Differences in parsing disappeared EM+Funct

Page 22 Finally: BabySRL Experiments Given potential arguments & predicates, train argument classifier  Given abstract representation of argument and predicate, determine its role (Agent, Patient, etc) To train, apply true labels to noisily identified arguments  Roles relative to predicate  Regularized Perceptron Abstract Representations (features) considered: Only depends on number and order of arguments  (1) Lexical: Target noun and target predicate  (2) NounPattern: first of two, second of three, etc. She always has salad: `She` is first of two, `salad` is second of two  (3) VerbPosition representation: `She` is before the verb, `salad` is after the verb

Page 23 BabySRL Experiments: Test Data Unlike previous experiments (CHILDES), here we compare to psycholinguistic data Evaluate on two-noun constructed sentences with novel verbs  Test two noun transitive vs. intransitive sentences  A krads B vs. A and B krads.  A and B filled with known nouns  Test generalization to unknown verb Reproduces experiments on young children  At 21 months of age make mistake: interpreting A as agent, B as patient in both cases.  Hypothesize children (at this stage) represent sentences in terms of number and order of nouns, not position of verb.

Page 24 NounPat features promote Agent-Patient (A0-A1) interpretation for both Transitive (correct) and Intransitive (incorrect). VerbPos pushes the intransitive case in the other direction. Works for Gold Training BabySRL Experiments Transitive Intransitive 10 Seed Nouns %A0A1 Parsing Algorithm Learns Agent First, Patient Second Reproduces error on Intransitive. With noisy representation does not recover even when VerbPos available

Page 25 Summary BabySRL: a realistic computational model for verb meaning acquisition via the structure-mapping theory.  Representational Issues  Unsupervised learning driven by minimal plausible knowledge sources Even with noisy parsing and argument identification, able to learn abstract rule: agent first, patient second. Difficult identification of predicates harms usefulness of superior representation (VerbPos)  Reproduces errors seen in children Next Steps:  Use correct high level semantic feedback to improve earlier identification and parsing decisions — improve VerbPos feature  Relax correctness of semantic feedback Thank You

Page 26 How do we acquire language? Balls Dog The Dog kradz the balls

Page 27 BabySRL Experiments Transitive Intransitive 10 Seed Nouns 365 Seed Nouns Even with VerbPosition available, with noisy parse make error on intransitive sentences.