The origin of syntactic bootstrapping: A computational model Michael Connor, Cynthia Fisher, & Dan Roth University of Illinois at Urbana-Champaign.

Slides:

Advertisements

Similar presentations

Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.

Advertisements

Why it is Hard to Label Our Concepts Jesse Snedeker and Lila Gleitman Harvard and U. Penn.

CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.

Semantics (Representing Meaning)

18 and 24-month-olds use syntactic knowledge of functional categories for determining meaning and reference Yarden Kedar Marianella Casasola Barbara Lust.

Page 1 Starting from Scratch in Semantic Role Labeling Michael Connor, Yael Gertner, Cynthia Fisher, Dan Roth.

Theeraporn Ratitamkul, University of Illinois and Adele E. Goldberg, Princeton University Introduction How do young children learn verb meanings? Scene.

Max-Margin Matching for Semantic Role Labeling David Vickrey James Connor Daphne Koller Stanford University.

Statistical NLP: Lecture 3

PSY 369: Psycholinguistics Language Acquisition: Learning words, syntax, and more.

Connectionist Simulation of the Empirical Acquisition of Grammatical Relations – William C. Morris, Jeffrey Elman Connectionist Simulation of the Empirical.

Topics in Cognition and Language: Theory, Data and Models *Perceptual scene analysis: extraction of meaning events, causality, intentionality, Theory of.

CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.

PSY 369: Psycholinguistics Some basic linguistic theory part3.

1 Human simulations of vocabulary learning Présentation Interface Syntaxe-Psycholinguistique Y-Lan BOUREAU Gillette, Gleitman, Gleitman, Lederer.

How Children Learn the Meanings of Nouns and Verbs Tingting “Rachel” Chung Ph. D. Candidate in Developmental Psychology University of Pittsburgh.

Generative Grammar(Part ii)

Phonetics, Phonology, Morphology and Syntax

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

The time-course of prediction in incremental sentence processing: Evidence from anticipatory eye movements Yuki Kamide, Gerry T.M. Altman, and Sarah L.

Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.

9/8/20151 Natural Language Processing Lecture Notes 1.

Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.

Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.

PropBank, VerbNet & SemLink Edward Loper. PropBank 1M words of WSJ annotated with predicate- argument structures for verbs. –The location & type of each.

Grammaticality Judgments Do you want to come with?20% I might could do that.38% The pavements are all wet.60% Y’all come back now.38% What if I were Romeo.

Based on “Semi-Supervised Semantic Role Labeling via Structural Alignment” by Furstenau and Lapata, 2011 Advisors: Prof. Michael Elhadad and Mr. Avi Hayoun.

THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)

TEMPLATE DESIGN © Learning Words and Rules Abstract Knowledge of Word Order in Early Sentence Comprehension Yael Gertner.

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

Psycholinguistic Theory

Adele E. Goldberg. How argument structure constructions are learned.

Mapping words to actions and events: How do 18-month-olds learn a verb? Mandy J. Maguire, Elizabeth A. Hennon, Kathy Hirsh-Pasek, Roberta M. Golinkoff,

Introduction to CL & NLP CMSC April 1, 2003.

A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:

CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

Artificial Intelligence: Natural Language

Linguistic Essentials

STRUCTURAL LIMITS ON VERB MAPPING The role of abstract structure in 2.5 year olds’ interpretations of novels verbs Article by : Cynthia Fisher,University.

What you have learned and how you can use it : Grammars and Lexicons Parts I-III.

Rules, Movement, Ambiguity

Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois.

CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.

Natural Language Processing Slides adapted from Pedro Domingos

CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.

FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.

NLP. Introduction to NLP Last week, Min broke the window with a hammer. The window was broken with a hammer by Min last week With a hammer, Min broke.

Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.

April 2010Semantic Grammar1 A short guide to Blackburn’s Grammar of English.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

CHILD LANGUAGE Research and further reading. Semantic Roles Roger Brown (1973) Looks at the 2 word stage ( months) and categorises utterances into.

Natural Language Processing Vasile Rus

PSYC 206 Lifespan Development Bilge Yagmurlu.

Assessing Grammar Module 5 Activity 5.

PSYC 206 Lifespan Development Bilge Yagmurlu.

Statistical NLP: Lecture 3

Basic Parsing with Context Free Grammars Chapter 13

Assessing Grammar Module 5 Activity 5.

CSC 594 Topics in AI – Applied Natural Language Processing

Statistical NLP: Lecture 9

Introduction Task: extracting relational facts from text

CS4705 Natural Language Processing

Linguistic Essentials

Traditional Grammar VS. Generative Grammar

Artificial Intelligence 2004 Speech & Natural Language Processing

Statistical NLP : Lecture 9 Word Sense Disambiguation

Presentation transcript:

The origin of syntactic bootstrapping: A computational model Michael Connor, Cynthia Fisher, & Dan Roth University of Illinois at Urbana-Champaign

How do children begin to understand sentences? Topid rivvo den marplox.

Two problems of ambiguity "the language" Topid rivvo den marplox. "the world" eating (sheep, food) feed (Johanna, sheep) help (Daddy, Johanna) scared (Johanna, sheep) are-nice (sheep) want (sheep, food) getting away (brother)

"the sentence" "the world" Typical view: knowledge of meaning drives knowledge of syntax Topid rivvo den marplox. feed (Johanna, sheep)

"the language" "the world" Typical view: knowledge of meaning drives knowledge of grammar Topid rivvo den marplox. feed (Johanna, sheep) With or without assumed innate constraints on syntax- semantics linking (e.g. Pinker, 1984, 1989; Tomasello, 2003) Sentence-meaning pairs

The problem of sentence and verb meanings Predicate terms don't label events, but adopt different perspectives on them (e.g., Clark, 1990; Fillmore, 1977; Gillette, Gleitman, Gleitman, & Lederer., 1999; Rispoli, 1989; Slobin, 1985) "I'll put this here." versus "This goes here."

"the sentence" "the world" If we take seriously the ambiguity of scenes... Topid rivvo den marplox. eating (sheep, food) feed (Johanna, sheep) help (Daddy, Johanna) scared (Johanna, sheep) are-nice (sheep) want (sheep, food) getting away (brother)

Syntactic bootstrapping (Gleitman, 1990; Gleitman et al., 2005; Landau & Gleitman, 1985; Naigles, 1990) "I'll put this here." versus "This goes here." sentence structure verbs' semantic predicate-argument structure

How could this be? How could aspects of syntactic structure guide early sentence interpretation... before the child has learned much about the syntax and morphology of this language?

Three main claims: (1) Structure-mapping: Syntactic bootstrapping begins with a bias toward one-to-one mapping between nouns in sentences and semantic arguments of predicate terms. (Fisher et al., 1994; Fisher et al., 2006; Gillette et al., 1999; Yuan, Fisher & Snedeker, in press)

Three main claims: (1) Structure-mapping: Syntactic bootstrapping begins with a bias toward one-to-one mapping between nouns in sentences and semantic arguments of predicate terms. (Fisher et al., 1994; Fisher et al., 2006; Gillette et al., 1999; Yuan, Fisher & Snedeker, in press) (2) Early abstraction: Children are biased toward abstract representations of language experience. (Gertner, Fisher & Eisengart, 2006; Thothathiri & Snedeker, 2008) (3) Independent encoding of sentence structure: Children gather distributional facts about verbs from listening experience. (Arunachalam & Waxman, 2010; Scott & Fisher, 2009; Yuan & Fisher, 2009)

A fourth prediction: (4) 'Real' syntactic bootstrapping: Children can learn which words are verbs by tracking their syntactic argument- taking behavior in sentences.

A computational model of syntactic bootstrapping Computational experiments simulate a learner whose knowledge of sentence structure is under our control We equip the model with an unlearned bias to map nouns onto abstract semantic roles, and ask: –Are partial representations based on sets of nouns useful for learning more about the native language? First case study: learning English word order –Could a learner identify verbs as verbs by noting their tendency to occur with particular numbers of arguments?

Computational Model of Semantic Role Labeling (SRL) A semantic analysis of sentences at the level of who does what to whom For each verb in the sentence: –SRL system tries to identify all constituents that fill a semantic role –and to assign them roles (agent, patient, goal, etc.)

Predicate Identification: Feature Extraction: Argument Classification: External Labeled Feedback Parsing: Argument Identification: 1 2 Ilikedarkbread. arg=INP < S verb= like A0 A1... arg= bread NP < VP verb= like A0 A1... Semantic Role Labeling (SRL): The basic idea I is an agent A0 bread is a patient A1

The nature of the semantic roles A key assumption of the SRL is that the semantic roles are abstract: –This is a property of the PropBank annotation scheme: verb-specific roles are grouped into macro-roles (Palmer, Gildea, & Kinsbury, 2005; cf. Dowty, 1991) Give: Arg0:Giver Arg1:Thing given Arg2:Recipient Like: Arg0:Liker Arg1:Object of affection Have: Arg0:Owner Arg1:Possession

The nature of the semantic roles A key assumption of the SRL is that the semantic roles are abstract: –This is a result of the PropBank annotation scheme: verb specific roles are grouped into macro-roles (e.g., Dowty, 1991) Like: Arg0:Liker Arg1:Object of affection Give: Arg0:Giver Arg1:Thing given Arg2:Recipient Have: Arg0:Owner Arg1:Possession Agent

The nature of the semantic roles A key assumption of the SRL is that the semantic roles are abstract: –This is a result of the PropBank annotation scheme: verb specific roles are grouped into macro-roles (e.g., Dowty, 1991) So, like children, an SRL learns to identify agents and patients in sentences, not givers and things-given. Give: Arg0:Giver Arg1:Thing given Arg2:Recipient Have: Arg0:Owner Arg1:Possession Patient/Theme Like: Arg0:Liker Arg1:Object of affection

Predicate Identification: Feature Extraction: Argument Classification: External Labeled Feedback Parsing: Argument Identification: 1 2 Ilikedarkbread. arg=INP < S verb= like A0 A1... arg= bread NP < VP verb= like A0 A1... This SRL knows the grammar... & reads minds I is an agent A0 bread is a patient A1

Predicate Identification: Feature Extraction: Argument Classification: External Labeled Feedback Parsing: Argument Identification: 1 2 arg=??? < ? verb= ??? A0 A1... arg=??? < ? verb= ??? A0 A1... This SRL knows the grammar... & reads minds what are they talking about? Topid rivvo den marplox.

Predicate Identification: Feature Extraction: Argument Classification: External Labeled Feedback HMM: Argument Identification: 1 2 Trained on CDS Seed Nouns N 58 I V 78 like 50 dark N 48 bread 1. 58: I, you, ya, Kent 78: like, got, did, had 50: many, nice, good 48: more, juice, milk 78: 13% 1-N, 54% 2-N 50: 41% 1-N, 31% 2-N arg=I 1 st of two before verb verb= like A0 A1... arg= bread 2 nd of two after verb verb= like A0 A1... Baby SRL I is an agent A0 bread is a patient A1

Predicate Identification: Feature Extraction: Argument Classification: Internal Animacy Feedback HMM: Argument Identification: 1 2 Trained on CDS Seed Nouns N 58 I V 78 like 50 dark N 48 bread 1. 58: I, you, ya, Kent 78: like, got, did, had 50: many, nice, good 48: more, juice, milk 78: 13% 1-N, 54% 2-N 50: 41% 1-N, 31% 2-N arg=I 1 st of two before verb verb= like A0 A1... arg= bread 2 nd of two after verb verb= like A0 A1... Baby SRL I is animate A0 bread is unknown, agent is I not A0

Unsupervised Part-of-Speech Clustering using a Hidden Markov Model (HMM) Yields context-sensitive clustering of words based on sequential dependencies (cf. Bannard et al., 2004; Chang et al., 2006; Mintz, 2003; Solan et al., 2005) A priori split between content and function words based on phonological differences (e.g., Shi et al., 1998; Connor et al., 2010) –80 hidden states, 30 pre-allocated to function words Trained on ~ 1 million words of unlabeled text from CHILDES HMM: 1 Trained on CDS 58 I 78 like 50 dark 48 bread 1. 58: I, you, ya, Kent 78: like, got, did, had 50: many, nice, good 48: more, juice, milk

Linking rules But these clusters are unlabeled How do we link unlabeled clusters with particular grammatical categories, thus particular roles in sentence interpretation (SRL)? –nouns, which are candidate arguments –and verbs, which take arguments HMM: 1 Trained on CDS 58 I 78 like 50 dark 48 bread 1. 58: I, you, ya, Kent 78: like, got, did, had 50: many, nice, good 48: more, juice, milk

Linking rules: Nouns first Syntactic bootstrapping is grounded in noun learning (e.g., Gillette et al., 1999) –Children learn the meanings of some nouns without syntactic knowledge Each noun is assumed to be a candidate argument –A simple form of semantic bootstrapping (Pinker, 1984) (10 to 75) Seed Nouns sampled from M-CDI 1 2 HMM: Argument Identification: Trained on CDS Seed Nouns N 58 I 78 like 50 dark N 48 bread 1. 58: I, you, ya, Kent 78: like, got, did, had 50: many, nice, good 48: more, juice, milk

Linking rules: What about verbs? Verbs take noun arguments Via syntactic bootstrapping, children identify a word as a verb by tracking its argument-taking behavior in sentences Predicate Identification: 3 HMM: Argument Identification: 1 2 Trained on CDS Seed Nouns N 58 I 78 like 50 dark N 48 bread 1. 58: I, you, ya, Kent 78: like, got, did, had 50: many, nice, good 48: more, juice, milk

Linking rules: What about verbs? Collect statistics over HMM training corpus: how often each state occurs with a particular number of arguments For each SRL input sentence, choose the word whose state is most likely to appear with the number of arguments found in the current sentence. Predicate Identification: 3 HMM: Argument Identification: 1 2 Trained on CDS Seed Nouns N 58 I V 78 like 50 dark N 48 bread 1. 58: I, you, ya, Kent 78: like, got, did, had 50: many, nice, good 48: more, juice, milk 78: 13% 1-N, 54% 2-N 50: 41% 1-N, 31% 2-N

78: 13% 1-N, 54% 2-N 50: 41% 1-N, 31% 2-N Predicate Identification: Feature Extraction: Argument Classification: HMM: Argument Identification: 1 2 Trained on CDS Seed Nouns N 58 I V 78 like 50 dark N 48 bread 1. 58: I, you, ya, Kent 78: like, got, did, had 50: many, nice, good 48: more, juice, milk arg=I 1 st of two before verb verb= like A0 A1... arg= bread 2 nd of two after verb verb= like A0 A1... Baby SRL This is a partial sentence representation.

78: 13% 1-N, 54% 2-N 50: 41% 1-N, 31% 2-N Predicate Identification: Feature Extraction: Argument Classification: HMM: Argument Identification: 1 2 Trained on CDS Seed Nouns N 58 I V 78 like 50 dark N 48 bread 1. 58: I, you, ya, Kent 78: like, got, did, had 50: many, nice, good 48: more, juice, milk arg=I 1 st of two before verb verb= like A0 A1... arg= bread 2 nd of two after verb verb= like A0 A1... Baby SRL This is a partial sentence representation. External Labeled Feedback I is an agent A0 bread is a patient A1

Semantic-role feedback??? Theories of language acquisition assume learning to understand sentences is a partially-supervised task. –use existing knowledge of words & syntax to assign a meaning to a sentence –the appropriateness of the meaning in context then provides the feedback [Johanna rivvo den sheep.]

bread is unknown, agent is I not A0 Baby SRL with Animacy-based Feedback Bootstrapped animacy feedback (a list of concrete nouns): 1.Animates are agents; inanimates are non-agents 2.Each verb has at most one agent 3.If there are two known animate nouns –Use current SRL classifier to choose the best agent (and use this decision as feedback to train the SRL) Note: Animacy feedback trains a simplified classifier, with only an agent/non-agent role distinction Internal Animacy Feedback I is animate A0

Predicate Identification: Feature Extraction: Argument Classification: Internal Animacy Feedback HMM: Argument Identification: 1 2 Trained on CDS Seed Nouns N 58 I V 78 like 50 dark N 48 bread 1. 58: I, you, ya, Kent 78: like, got, did, had 50: many, nice, good 48: more, juice, milk 78: 13% 1-N, 54% 2-N 50: 41% 1-N, 31% 2-N arg=I 1 st of two before verb verb= like A0 notA0 arg= bread 2 nd of two after verb verb= like A0 notA0 Baby SRL I is animate A0 bread is unknown, agent is I not A0

Training the Baby SRL 3 corpora of child-directed speech –Annotated parental utterances using PropBank annotation scheme –Speech to Adam, Eve, Sarah (Brown, 1973) Adam (2;3 - 3;2) Train on 01-20: 3951 propositions, 8107 arguments Eve (1;6 - 2;3) Train on 01-18: 4029 propositions, 8499 arguments Sarah (2;3 - 4;1) Train on 01-83: 8570 propositions, arguments

Testing the Baby SRL Constructed test sentences like those used in experiments with children –Unknown verbs & two animate nouns force the SRL to rely on syntactic knowledge "Adam krads Daddy!" krad Adam Mommy Daddy Ursula... Adam Mommy Daddy Ursula...

Testing the Baby SRL Compare systems trained on different syntactic features derived from the partial sentence representation Lexical baseline: Verb and argument word features only Noun Pattern: Lexical features plus features such as '1 st of 2 nouns', '2 nd of 2 nouns' Verb Position: Lexical features plus 'before verb', 'after verb' Combined: All feature types arg=I verb= like arg=I 1 st of two before verb verb= like arg=I 1 st of two verb= like arg=I before verb verb= like

Question 1: Are partial sentence representations useful, in principle, for learning new syntactic knowledge? Learning English word order Start with "gold standard" part-of-speech tagging and "gold standard" semantic role feedback

Are partial sentence representations useful, as a starting point for learning? We assume that representations as simple as an ordered set of nouns can yield useful information about verb argument structure and sentence interpretation. Is this true, in ordinary samples of language use? Despite all the noise this simple representation will add to the data?

Result 1: Partial sentence representations are useful for further syntax learning (A krads B)

Question 2: Can partial sentence representations be built via unsupervised clustering... and the proposed linking rules? Predicate Identification: 3 HMM: Argument Identification: 1 2 Trained on CDS Seed Nouns N 58 I V 78 like 50 dark N 48 bread 1. 58: I, you, ya, Kent 78: like, got, did, had 50: many, nice, good 48: more, juice, milk 78: 13% 1-N, 54% 2-N 50: 41% 1-N, 31% 2-N

Result 2: Partial sentence representations can be built via unsupervised clustering Number of known seed nouns

Question 3: Now for the Baby SRL Are partial sentence representations built via unsupervised clustering useful for learning new syntactic knowledge? With animacy-based feedback? Learning English word order Starting from scratch...

Result 3: Partial sentence representations based on unsupervised clustering are useful

SRL Summary Result 1: Partial sentence representations are useful for further syntax learning (gold arguments & feedback) Result 2: Partial sentence representations can be built via unsupervised clustering, and the proposed linking rules: –Semantic bootstrapping for nouns, 'real' syntactic bootstrapping for verbs Result 3: Partial sentence representations based on unsupervised clustering are useful for further syntax learning –Robust learning based on noisy unsupervised 'parse' of sentences, and noisy animacy-based feedback verbs precede objects in English

Three main claims: (1) Structure-mapping: Syntactic bootstrapping begins with an unlearned bias toward one-to-one mapping between nouns in sentences and semantic arguments of predicate terms. (2) Early abstraction: Children are biased toward usefully abstract representations of language experience. (3) Independent encoding of sentence structure: Children gather distributional facts about new verbs from listening experience.

A fourth prediction: (4) 'Real' syntactic bootstrapping: Children learn which words are the verbs by tracking their syntactic argument- taking behavior in sentences Hey, she pushed her. Will you push me on the swing? John pushed the cat off the sofa... Verb = Push [noun1, noun2] 2-participant relation

Acknowledgements Yael Gertner Jesse Snedeker Lila Gleitman NSF NICHD Soondo Baek Kate Messenger Sylvia Yuan Rose Scott