A Bayesian Approach to the Poverty of the Stimulus Amy Perfors MIT With Josh Tenenbaum (MIT) and Terry Regier (University of Chicago)

Slides:

Advertisements

Similar presentations

Lecture #9 Syntax, Pragmatics, and Semantics © 2014 MARY RIGGS 1.

Advertisements

Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.

Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT Joint work with Fei Xu, Amy Perfors, Terry Regier, Charles Kemp.

C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.

Review Exercises 1) Do the COMPONENTIAL analysis (not the compositional one) of the following words: hen b) rooster Componential analysis 2) Does ‘A’

Psych 156A/ Ling 150: Acquisition of Language II Lecture 12 Poverty of the Stimulus I.

Syntactic analysis using Context Free Grammars. Analysis of language Morphological analysis – Chairs, Part Of Speech (POS) tagging – The/DT man/NN left/VBD.

Rules vs. Constructions A debate on question-acquisition Lucia Pozzan, Lidiya Tornyova & Virginia Valian IASCL 2011.

Ling 240: Language and Mind Structure Dependence in Grammar Formation.

Movement Markonah : Honey buns, there’s something I wanted to ask you

Week 3a. UG and L2A: Background, principles, parameters CAS LX 400 Second Language Acquisition.

1 Language and kids Linguistics lecture #8 November 21, 2006.

MORPHOLOGY - morphemes are the building blocks that make up words.

Psych 156A/ Ling 150: Acquisition of Language II Lecture 14 Poverty of the Stimulus III.

The Nature of Learner Language

For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.

CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.

Chapter 3 Describing Syntax and Semantics Sections 1-3.

Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:

Topic: Theoretical Bases for Cognitive Method Objectives Trainees will be able to give reasons for the design and procedures of the Cognitive Method.

Chapter 3 Describing Syntax and Semantics Sections 1-3.

Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.

1 CONTEXT-FREE GRAMMARS. NLE 2 Syntactic analysis (Parsing) S NPVP ATNNSVBD NP AT NNthechildrenate thecake.

Normal forms for Context-Free Grammars

Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.

Stochastic POS tagging Stochastic taggers choose tags that result in the highest probability: P(word | tag) * P(tag | previous n tags) Stochastic taggers.

Models of Generative Grammar Smriti Singh. Generative Grammar  A Generative Grammar is a set of formal rules that can generate an infinite set of sentences.

Lecture 1 Introduction: Linguistic Theory and Theories

Generative Grammar(Part ii)

Linguistic Theory Lecture 3 Movement. A brief history of movement Movements as ‘special rules’ proposed to capture facts that phrase structure rules cannot.

11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.

Grammars and Parsing. Sentence  Noun Verb Noun Noun  boys Noun  girls Noun  dogs Verb  like Verb  see Grammars Grammar: set of rules for generating.

Context Free Grammars Reading: Chap 12-13, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.

Psych 156A/ Ling 150: Acquisition of Language II Lecture 16 Language Structure II.

For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.

Evolution of Universal Grammar Pia Göser Universität Tübingen Seminar: Sprachevolution Dozent: Prof. Jäger

Adaptor Grammars Ehsan Khoddammohammadi Recent Advances in Parsing Technology WS 2012/13 Saarland University 1.

Dr. Monira Al-Mohizea MORPHOLOGY & SYNTAX WEEK 12.

THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)

GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.

Amy Perfors & Josh Tenenbaum, MIT

Adele E. Goldberg. How argument structure constructions are learned.

Context Free Grammars Reading: Chap 9, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Rada Mihalcea.

Universal Grammar Noam Chomsky.

Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.

For Wednesday Read chapter 23 Homework: –Chapter 22, exercises 1,4, 7, and 14.

November 2011CLINT-LN CFG1 Computational Linguistics Introduction Context Free Grammars.

PSY270 Michaela Porubanova. Language  a system of communication using sounds or symbols that enables us to express our feelings, thoughts, ideas, and.

Rules, Movement, Ambiguity

Explorations in language learnability using probabilistic grammars and child-directed speech Amy Perfors & Josh Tenenbaum, MIT Terry Regier, U Chicago.

CSA2050 Introduction to Computational Linguistics Parsing I.

1 Context Free Grammars October Syntactic Grammaticality Doesn’t depend on Having heard the sentence before The sentence being true –I saw a unicorn.

Linguistic Anthropology Bringing Back the Brain. What Bloomfield Got “Right” Emphasized spoken language rather than written language The role of the linguist.

1 LIN 1310B Introduction to Linguistics Prof: Nikolay Slavkov TA: Qinghua Tang CLASS 16, March 6, 2007.

ISBN Chapter 3 Describing Syntax and Semantics.

Introduction Chapter 1 Foundations of statistical natural language processing.

1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.

Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.

1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.

Natural Language Processing Lecture 14—10/13/2015 Jim Martin.

December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)

Chapter 3 Language Acquisition: A Linguistic Treatment Jang, HaYoung Biointelligence Laborotary Seoul National University.

Chapter 10 Language acquisition Language acquisition----refers to the child’s acquisition of his mother tongue, i.e. how the child comes to understand.

Natural Language Processing Vasile Rus

Psych156A/Ling150: Psychology of Language Learning

Theories of Language Development

Chapter Eight Syntax.

Writing about Grammatical Development

Chapter Eight Syntax.

COMPILER CONSTRUCTION

Presentation transcript:

A Bayesian Approach to the Poverty of the Stimulus Amy Perfors MIT With Josh Tenenbaum (MIT) and Terry Regier (University of Chicago)

InnateLearned

InnateLearned Explicit Structure No explicit Structure

NoYes Language has hierarchical phrase structure

Why believe that language has hierarchical phrase structure?  Formal properties + information-theoretic, simplicity-based argument (Chomsky, 1956) Dependency structure of language: A finite-state grammar cannot capture the infinite sets of English sentences with dependencies like this If we restrict ourselves to only a finite set of sentences, then in theory a finite-state grammar could account for them : “but this grammar will be so complex as to be of little use or interest.”

Simple declarative: The girl is happy, They are eating Simple interrogative: Is the girl happy? Are they eating? 1.Linear: move the first “is” (auxiliary) in the sentence to the beginning 2.Hierarchical: move the auxiliary in the main clause to the beginning Result Hypotheses Complex declarative: The girl who is sleeping is happy. Data Children say: Is the girl who is sleeping happy? NOT: *Is the girl who sleeping is happy? Test Chomsky, 1965, 1980; Crain & Nakayama, 1987 Why believe that structure dependence is innate? The Argument from the Poverty of the Stimulus (PoS):

Why believe it’s not innate?  There are actually enough complex interrogatives (Pullum & Scholz 02)  Children’s behavior can be explained via statistical learning of natural language data (Lewis & Elman 01; Reali & Christiansen 05) It is not necessary to assume a grammar with explicit structure

InnateLearned Explicit Structure No explicit Structure

InnateLearned Explicit Structure No explicit Structure

Our argument

 We suggest that, contra the PoS claim: It is possible, given the nature of the input and certain domain- general assumptions about the learning mechanism, that an ideal, unbiased learner can realize that language has a hierarchical phrase structure; therefore this knowledge need not be innate The reason: grammars with hierarchical phrase structure offer an optimal tradeoff between simplicity and fit to natural language data Our argument

Plan  Model Data: corpus of child-directed speech (CHILDES) Grammars  Linear & hierarchical  Both: Hand-designed & result of local search  Linear: automatic, unsupervised ML Evaluation  Complexity vs. fit  Results  Implications

The model: Data  Corpus from CHILDES database (Adam, Brown corpus)  55 files, age range 2;3 to 5;2  Sentences spoken by adults to children  Each word replaced by syntactic category det, n, adj, prep, pro, prop, to, part, vi, v, aux, comp, wh, c  Ungrammatical sentences and the most grammatically complex sentence types were removed: kept out of utterances Topicalized sentences(66); sentences serial verb constructions (459), subordinate phrases (845), sentential complements (1636), and conjunctions (634). Ungrammatical sentences (444)

Data  Final corpus contained 2336 individual sentence types corresponding to sentence tokens

Data: variation  Amount of evidence available at different points in development

Data: variation  Amount of evidence available at different points in development  Amount comprehended at different points in development

Data: amount available  Rough estimate – split by age Epoch 1 Epoch 2 Epoch 3 Epoch 4 Epoch 5 # FilesAge% types# types 2;3 to 3;1 2;3 to 2; % 55% 74% 89% 100% 11 2;3 to 3;5 2;3 to 4;2 2;3 to 5;2 2; % Epoch

Data: amount comprehended  Rough estimate – split by frequency Level 1 Level 2 Level 3 Level 4 Level 5 Level 6 Frequency# types% tokens% types (all) 0.3% 1.6% 2.9% 4.9% 12% 100% 28% 55% 64% 71% 82% 100%

The model  Data Child-directed speech (CHILDES)  Grammars Linear & hierarchical Both: Hand-designed & result of local search Linear: automatic, unsupervised ML )  Evaluation Complexity vs. fit

Grammar types Context-free grammar RulesExample “Flat” grammar Rules List of each sentence Example Regular grammar Rules NT  t NT Example NT  t NT  NT NT NT  t NT NT  NT NT  t Hierarchical Linear Rules Example 1-state grammar Anything accepted

CFG-S Description Designed to be as linguistically plausible as possible Example productions Standard CFG CFG-L Description Derived from CFG-S; contains additional productions corresponding to different expansions of the same NT (puts less probability mass on recursive productions) Example productions Larger CFG 77 rules, 15 non-terminals 133 rules, 15 non-terminals Specific hierarchical grammars: Hand-designed

FLAT List of each sentence 2336 rules, 0 nonterminals 1-STATE Anything accepted 26 rules, 0 nonterminals Exact fit, no compression Poor fit, high compression Specific linear grammars: Hand-designed

REG-N Narrowest regular derived from CFG 289 rules, 85 nonterminals FLAT List of each sentence 2336 rules, 0 nonterminals 1-STATE Anything accepted 26 rules, 0 nonterminals Exact fit, no compression Poor fit, high compression Specific linear grammars: Hand-designed

Mid-level regular derived from CFG REG-M 169 rules, 14 nonterminals REG-N Narrowest regular derived from CFG 289 rules, 85 nonterminals FLAT List of each sentence 2336 rules, 0 nonterminals 1-STATE Anything accepted 26 prods, 0 nonterminals Exact fit, no compression Poor fit, high compression Specific linear grammars: Hand-designed

REG-B Broadest regular derived from CFG 117 rules, 10 nonterminals Mid-level regular derived from CFG REG-M 169 prods, 14 nonterminals REG-N Narrowest regular derived from CFG 289 prods, 85 nonterminals FLAT List of each sentence 2336 rules, 0 nonterminals 1-STATE Anything accepted 26 rules, 0 nonterminals Exact fit, no compression Poor fit, high compression Specific linear grammars: Hand-designed

Local search around hand- designed grammars Automated search Linear: unsupervised, automatic HMM learning Goldwater & Griffiths, 2007 Bayesian model for acquisition of trigram HMM (designed for POS tagging, but given a corpus of syntactic categories, learns a regular grammar)

The model  Data Child-directed speech (CHILDES)  Grammars Linear & hierarchical Hand-designed & result of local search Linear: automatic, unsupervised ML  Evaluation Complexity vs. fit

Grammars T: type of grammar G: Specific grammar D: Data Context-free Regular Flat, 1-state unbiased (uniform)

Grammars T: type of grammar G: Specific grammar D: Data Context-free Regular Flat, 1-state data fit (likelihood) complexity (prior)

 Low prior probability = more complex  Low likelihood = poor fit to the data Fit: low Simplicity: high Fit: moderate Simplicity: moderate Fit: high Simplicity: low Tradeoff: Complexity vs. Fit

Measuring complexity: prior  Designing a grammar (God’s eye view)  Grammars with more rules and nonterminals will have lower prior probability n = # of nonterminals N i = # items in production i P k = # productions of nonterminal k V = vocab size Θ k = production probability parameters for k

Measuring fit: likelihood  Probability of that grammar generating the data Product of the probability of each parse Ex: pro aux det n = 0.25= 0.5*0.25*1.0*0.25*0.5 = 0.016

Plan  Model Data: corpus of child-directed speech (CHILDES) Grammars  Linear & hierarchical  Hand-designed & result of local search  Linear: automated, unsupervised ML Evaluation  Complexity vs. fit  Results  Implications

Corpus level FLATREG-NREG-MREG-BREG- AUTO 1-STCFG-SCFG-L Results: data split by frequency levels (estimate of comprehension) Log posterior probability (lower magnitude = better)

Results: data split by age (estimate of availability)

Log posterior probability (lower magnitude = better) Corpus epoch FLATREG-NREG-MREG-BREG- AUTO 1-STCFG-SCFG-L

Generalization: How well does each grammar predict sentences it hasn’t seen?

TypeIn corp? ExampleRGNRG-MRG-BAUTO1-STCFG-SCFG-L Simple Declarative Eagles do fly. (n aux vi) Simple Interrogative Do eagles fly? (aux n vi) Complex Declarative Eagles that are alive do fly. (n comp aux adj aux vi) Complex Interrogative Do eagles that are alive fly? (aux n comp aux adj vi) Complex Interrogative Are eagles that alive do fly? (aux n comp adj aux vi) Complex interrogatives

 Shown that given reasonable domain-general assumptions, an unbiased rational learner could realize that languages have a hierarchical structure based on typical child-directed input  This paradigm is valuable: it makes any assumptions explicit and enables us to rigorously evaluate how different representations capture the tradeoff between simplicity and fit to data  In some ways, “higher-order” knowledge may be easier to learn than specific details (the “blessing of abstraction”) Take-home messages

Implications for innateness?  Ideal learner  Strong(er) assumptions: The learner can find the best grammar in the space of possibilities  Weak(er) assumptions The learner has the ability to parse the corpus into syntactic categories The learner can represent both linear and hierarchical grammars Assume a particular way of calculating complexity & data fit  Have we actually found representative grammars?

The End Thanks also to the following for many helpful discussions: Virginia Savova, Jeff Elman, Danny Fox, Adam Albright, Fei Xu, Mark Johnson, Ken Wexler, Ted Gibson, Sharon Goldwater, Michael Frank, Charles Kemp, Vikash Mansinghka, Noah Goodman

Grammars T: grammar type G: Specific grammar D: Data

Grammars Context-free Regular Flat, 1-state T: grammar type G: Specific grammar D: Data

The Argument from the Poverty of the Stimulus (PoS) G B T D P1. Children show a specific pattern of behavior B P2. A particular generalization G must be grasped in order to produce B P3. It is impossible to reasonably induce G simply on the basis of the data D that children receive C1. Some abstract knowledge T, limiting which specific generalizations G are possible, is necessary

The Argument from the Poverty of the Stimulus (PoS) G B T D P1. Children show a specific pattern of behavior B P2. A particular generalization G must be grasped in order to produce B P3. It is impossible to reasonably induce G simply on the basis of the data D that children receive C1. Some abstract knowledge T, limiting which specific generalizations G are possible, is necessary Corollary: The abstract knowledge T could not itself be learned, or could not be learned before G is known C2. T must be innate

The Argument from the Poverty of the Stimulus (PoS) P1. Children show a specific pattern of behavior B P2. A particular generalization G must be grasped in order to produce B P3. It is impossible to reasonably induce G simply on the basis of the data D that children receive C1. Some abstract knowledge T, limiting which specific generalizations G are possible, is necessary Corollary: The abstract knowledge T could not itself be learned, or could not be learned before G is known C2. T must be innate G: a specific grammar D: typical child-directed speech input B: children don’t make certain mistakes (they don’t seem to entertain structure-independent hypotheses) T: language has hierarchical phrase structure

Data  Final corpus contained 2336 individual sentence types corresponding to sentence tokens  Why types? Grammar learning depends on what sentences are generated, not on how many of each type there are Much more computationally tractable The distribution of sentence tokens depends on many factors other than the grammar (e.g., pragmatics, semantics, discussion topics) [Goldwater, Griffiths, Johnson 05]

REG-B Broadest regular derived from CFG 117 rules, 10 nonterminals Mid-level regular derived from CFG REG-M 169 prods, 14 nonterminals REG-N Narrowest regular derived from CFG 289 prods, 85 nonterminals FLAT List of each sentence 2336 rules, 0 nonterminals 1-STATE Anything accepted 26 rules, 0 nonterminals Exact fit, no compression Poor fit, high compression Specific linear grammars: Hand-designed

Why these results?  Natural language actually is generated from a grammar that looks more like a CFG  The other grammars overfit and therefore do not capture important language-specific generalizations Flat

Computing the prior… CFG REG Context-free grammar Regular grammar NT  t NT NT  t NT  NT NT NT  t NT NT  NT NT  t

Likelihood, intuitively Z: rule out because it does not explain some of the data points X and Y both “explain” the data points, but X is the more likely source

Possible empirical tests  Present people with data the model learns FLAT, REG, and CFGs from; see which novel productions they generalize to Non-linguistic? To small children?  Examples of learning regular grammars in real life: does the model do the same?

Do people learn regular grammars? S1 s2 s3 w1 w1 w1 Miss Mary Mack, Mack, Mack All dressed in black, black, black With silver buttons, buttons, buttons All down her back, back, back She asked her mother, mother, mother, … X s1 s2 s3 Spanish dancer, do the splits. Spanish dancer, give a kick. Spanish dancer, turn around. Children’s Songs: Line level grammar

Do people learn regular grammars? Teddy bear, teddy bear, turn around. Teddy bear, teddy bear, touch the ground. Teddy bear, teddy bear, show your shoe. Teddy bear, teddy bear, that will do. Teddy bear, teddy bear, go upstairs. … Bubble gum, bubble gum, chew and blow, Bubble gum, bubble gum, scrape your toe, Bubble gum, bubble gum, tastes so sweet, Children’s Songs: Song level: X X s1 s2 s3 Dolly Dimple walks like this, Dolly Dimple talks like this, Dolly Dimple smiles like this, Dolly Dimple throws a kiss.

Do people learn regular grammars? A my name is Alice And my husband's name is Arthur, We come from Alabama, Where we sell artichokes. B my name is Barney And my wife's name is Bridget, We come from Brooklyn, Where we sell bicycles. … Songs containing items represented as lists (where order matters) Dough a Thing I Buy Beer With Ray a guy who buys me beer Me, the one who wants a beer Fa, a long way to the beer So, I think I'll have a beer La, -gers great but so is beer! Tea, no thanks I'll have a beer … Cinderella, dressed in yella, Went upstairs to kiss a fella, Made a mistake and kissed a snake, How many doctors did it take? 1, 2, 3, …

Do people learn regular grammars? You put your [body part] in You put your [body part] out You put your [body part] in and you shake it all about You do the hokey pokey And you turn yourself around And that's what it's all about! Most of the song is a template, with repeated (varying) element If I were the marrying kind I thank the lord I'm not sir The kind of rugger I would be Would be a rugby [position/item] sir Cos I'd [verb phrase] And you'd [verb phrase] We'd all [verb phrase] together … If you’re happy and you know it [verb] your [body part] If you’re happy and you know it then your face will surely show it If you’re happy and you know it [verb] your [body part]

Do people learn regular grammars? There was a farmer had a dog, And Bingo was his name-O. B-I-N-G-O! And Bingo was his name-O! (each subsequent verse, replace a letter with a clap) Other interesting structures… I know a song that never ends, It goes on and on my friends, I know a song that never ends, And this is how it goes: (repeat) Oh, Sir Richard, do not touch me (each subsequent verse, remove the last word at the end of the sentence)

New PRG: 1-state SEnd Det, n, pro, prop, prep, adj, aux, wh, comp, to, v, vi, part Log(prior) = 0; no free parameters

Another PRG: standard + noise  For instance, level-1 PRG + noise would be the best regular grammar for the corpus at level 1, plus the 1-state model This could parse all levels of evidence Perhaps this would be better than a more complicated PRG at later levels of evidence

Corpus level FlatRG-LRG-SCFG-SCFG-LFlatRG-LRG-SCFG-SCFG-L Results: frequency levels (comprehension estimates) Log prior, log likelihood (abs) Log posterior (smaller is better) P P P P P P L L L L L L

PeriodFlatRG-LRG-SCFG-SCFG-LFlatRG-LRG-SCFG-SCFG-L Results: availability by age Log prior, log likelihood (abs) P P P P P P L L L L L L Log posterior (smaller is better)

 One type of hand-designed grammar 69 productions, 14 nonterminals 390 productions, 85 nonterminals Specific grammars of each type

 The other type of hand-designed grammar 126 productions, 14 nonterminals 170 productions, 14 nonterminals Specific grammars of each type

P1. It is impossible to have made some generalization G simply on the basis of data D P2. Children show behavior B P3. Behavior B is not possible without having made G G: a specific grammar D: typical child-directed speech input B: children don’t make certain mistakes (they don’t seem to entertain structure-independent hypotheses) T: language has hierarchical phrase structure C1. Some constraints T, which limit what type of generalizations G are possible, must be innate The Argument from the Poverty of the Stimulus (PoS)

#1: Children hear complex interrogatives  Well, a few, but not many  Adam (CHILDES) – 0.048% No yes-no questions Four wh-questions (e.g., “What is the music it’s playing?”)  Nina (CHILDES) – 0.068% No yes-no questions 14 wh-questions  In all, most estimates are << 1% of input Legate & Yang 2002

 Well, a few, but not many  Adam (CHILDES) – 0.048% No yes-no questions Four wh-questions (e.g., “What is the music it’s playing?”)  Nina (CHILDES) – 0.068% No yes-no questions 14 wh-questions  In all, most estimates are << 1% of input Legate & Yang 2002 How much is “enough”? #1: Children hear complex interrogatives

#2: Can get the behavior without structure  There is enough statistical information in the input to be able to conclude which type of complex interrogative is ungrammatical Reali & Christiansen 2004; Lewis & Elman, 2001 Rare: comp adj aux Common: comp aux adj

 Response: there is enough statistical information in the input to be able to conclude that “Are eagles that alive can fly?” is ungrammatical Reali & Christiansen 2004; Lewis & Elman, 2001 Rare: comp adj aux Common: comp aux adj  Sidesteps the question: does not address the innateness of structure (knowledge X)  Explanatorily opaque #2: Can get the behavior without structure

Why do linguists believe that language has hierarchical phrase structure?  Formal properties + information-theoretic, simplicity-based argument (Chomsky, 1956) A sentence has an (i,j) dependency if replacement of the ith symbol ai of S by bi requires a corresponding replacement of the jth symbolf aj of S by bj If S has an m-termed dependency set in L, at least 2^m states are necessary in the finite-state grammar that generates L  Therefore, if L is a finite-state language, then there is an m such that no sentence S of L has a dependency set of more than m terms in L The “mirror language” made up of sentences consisting of a string X followed by X in reverse (e.g., aa, abba, babbab, aabbaa, etc), has the property that for any m we can find a dependency set D = {(1,2m), (2,2m-1),..,(m,m+1)}. Therefore it cannot be captured by any finite-state grammar English has infinite sets of sentences with dependency sets with more than any fixed number of terms. E.g. “the man who said that S5 is arriving today”, there is a dependency between “man” and “is”. Therefore English cannot be finite-state There is the possible counterargument that since any finite corpus could be captured by a finite-state grammar, then English is only not finite-state in the limit – but in practice, it could be  Easy counterargument: simplicity considerations. Chomsky: “If the processes have a limit, then the construction of a finite-state grammar will not be literally impossible (since a list is a trivial finite-state grammar), but this grammar will be so complex as to be of little use or interest.”

InnateLearned The big picture

InnateLearned Grammar Acquisition (Chomsky)

P1. Children show behavior B B The Argument from the Poverty of the Stimulus (PoS)

P1. Children show behavior B P2. Behavior B is not possible without having some specific grammar or rule G G B The Argument from the Poverty of the Stimulus (PoS)

P1. Children show behavior B P2. Behavior B is not possible without having some specific grammar or rule G P3. It is impossible to have learned G simply on the basis of data D G B D X The Argument from the Poverty of the Stimulus (PoS)

C1. Some constraints T, which limit what type of grammars are possible, must be innate G B T D P1. Children show behavior B P2. Behavior B is not possible without having some specific grammar or rule G P3. It is impossible to have learned G simply on the basis of data D The Argument from the Poverty of the Stimulus (PoS)

There are enough complex interrogatives in D P1. It is impossible to have made some generalization G simply on the basis of data D P2. Children show behavior B P3. Behavior B is not possible without having made G e.g., Pullum & Scholz 2002 C1. Some constraints T, which limit what type of generalizations G are possible, must be innate Replies to the PoS argument

There are enough complex interrogatives in D P1. It is impossible to have made some generalization G simply on the basis of data D P2. Children show behavior B P3. Behavior B is not possible without having made G Pullum & Scholz, 2002 There is a route to B other than G (statistical learning) e.g., Lewis & Elman, 2001 Reali & Christiansen, 2005 C1. Some constraints T, which limit what type of generalizations G are possible, must be innate Replies to the PoS argument

InnateLearned

InnateLearned Explicit structure No explicit structure

InnateLearned Explicit structure No explicit structure

 Assumptions: equipped with Capacity to represent both linear and hierarchical grammars (no bias) Rational Bayesian learning mechanism & probability calculation Ability to effectively search the space of possible grammars Our argument

 Shown that given reasonable domain-general assumptions, an unbiased rational learner could realize that languages have a hierarchical structure based on typical child-directed input Take-home message

 Shown that given reasonable domain-general assumptions, an unbiased rational learner could realize that languages have a hierarchical structure based on typical child-directed input  Can use this paradigm to explore the role of recursive elements in a grammar The “winning” grammar contains additional non-recursive counterparts for complex NPs Perhaps language, while fundamentally recursive, contains duplicate non-recursive elements that more precisely match the input? Take-home message

The role of recursion  Evaluated an additional grammar (CFG-DL) that contained no recursive complex NPs at all – instead, multiply- embedded, depth-limited ones  No sentence in the corpus occurred with more than two levels of nesting

Corpus level FLATREG-NREG-MREG-B1-ST CFG-S CFG-LCFG-DL The role of recursion: results Log posterior probability (lower magnitude = better)

The role of recursion: Results Corpus level FLATREG-N REG-M REG-B1-STCFG-SCFG-L CFG-DL

The role of recursion: Implications  Optimal tradeoff results in a grammar that “goes beyond the data” in interesting ways: Auxiliary fronting Recursive complex NPs  A grammar with recursive complex NPs is more optimal, even though: Recursive productions hurt in the likelihood There are no sentences with more than two levels of nesting in the input