Introduction to Language Acquisition Theory Janet Dean Fodor St. Petersburg July 2013 Class 2. From computer science (then) to psycholinguistics (now)

Slides:



Advertisements
Similar presentations
Intro to NLP - J. Eisner1 Learning in the Limit Golds Theorem.
Advertisements

Chapter 4 Key Concepts.
Psycholinguistic what is psycholinguistic? 1-pyscholinguistic is the study of the cognitive process of language acquisition and use. 2-The scope of psycholinguistic.
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
1 Dependency structure and cognition Richard Hudson Depling2013, Prague.
Ling 240: Language and Mind Structure Dependence in Grammar Formation.
Movement Markonah : Honey buns, there’s something I wanted to ask you
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
Week 3a. UG and L2A: Background, principles, parameters CAS LX 400 Second Language Acquisition.
Psych 156A/ Ling 150: Acquisition of Language II Lecture 14 Poverty of the Stimulus III.
1 RUNNING a CLASS (2) Pertemuan Matakuliah: G0454/Class Management & Education Media Tahun: 2006.
Module 14 Thought & Language. INTRODUCTION Definitions –Cognitive approach method of studying how we process, store, and use information and how this.
Language, Mind, and Brain by Ewa Dabrowska Chapter 9: Syntactic constructions, pt. 2.
1 Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
Language, Mind, and Brain by Ewa Dabrowska Chapter 2: Language processing: speed and flexibility.
1 Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
Natural Language Query Interface Mostafa Karkache & Bryce Wenninger.
1 Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
Psycholinguistics 12 Language Acquisition. Three variables of language acquisition Environmental Cognitive Innate.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
Models of Generative Grammar Smriti Singh. Generative Grammar  A Generative Grammar is a set of formal rules that can generate an infinite set of sentences.
Generative Grammar(Part ii)
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Main Branches of Linguistics
Emergence of Syntax. Introduction  One of the most important concerns of theoretical linguistics today represents the study of the acquisition of language.
X Language Acquisition
Psych 156A/ Ling 150: Acquisition of Language II Lecture 16 Language Structure II.
Cognitive Development: Language Infants and children face an especially important developmental task with the acquisition of language.
Lecture 16. Train-The-Trainer Maximize Learning Train-The-Trainer.
1 Words and rules Linguistics lecture #2 October 31, 2006.
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
Planning and Integrating Curriculum: Unit 4, Key Topic 1http://facultyinitiative.wested.org/1.
1 Introduction to Language Acquisition Theory Janet Dean Fodor St. Petersburg July 2013 Class 7. How could children detect the triggers in their input?
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.
1 Introduction to Language Acquisition Theory Janet Dean Fodor St. Petersburg July 2013 Class 6. Unambiguous triggers for setting syntactic parameters.
Simulated Evolution of Language By: Jared Shane I400: Artificial Life as an approach to Artificial Intelligence January 29, 2007.
Introduction to Language Acquisition Theory Janet Dean Fodor St. Petersburg July 2013 Class 1. Language acquisition theory: Origins and issues.
Psycholinguistic aspects of interlanguage
1 Acqusisition of Syntax Guasti Chapter 6; Thornton (1995)
1 LIN 1310B Introduction to Linguistics Prof: Nikolay Slavkov TA: Qinghua Tang CLASS 16, March 6, 2007.
CAREERS STUDY SKILLS AND HABITS. STUDY HABITS Before you can improve your study habits, you have to develop “a plan;” This is based on your previous habits,
csa3050: Parsing Algorithms 11 CSA350: NLP Algorithms Parsing Algorithms 1 Top Down Bottom-Up Left Corner.
Automata & Formal Languages, Feodor F. Dragan, Kent State University 1 CHAPTER 7 Time complexity Contents Measuring Complexity Big-O and small-o notation.
1 Syntax 1. 2 In your free time Look at the diagram again, and try to understand it. Phonetics Phonology Sounds of language Linguistics Grammar MorphologySyntax.
Thinking About My Thinking Working With Friends to Solve Problems Prep students working together.
1 Introduction to Language Acquisition Theory Janet Dean Fodor St. Petersburg July 2013 Class 8. Implications and further questions Class 8. Implications.
Language Development. Four Components of Language Phonology sounds Semantics meanings of words Grammar arrangements of words into sentences Pragmatics.
Universal Grammar Chomsky and his followers no longer use the term LAD, but refer to the child’s innate endowment as Universal Grammar (UG). UG is a theory.
Input, Interaction, and Output Input: (in language learning) language which a learner hears or receives and from which he or she can learn. Enhanced input:
CERTIFICATE IN ASSESSING VOCATIONAL ACHIEVEMENT (CAVA) Unit 1: Understanding the principles and practices of assessment.
Chapter 3 Language Acquisition: A Linguistic Treatment Jang, HaYoung Biointelligence Laborotary Seoul National University.
CMSC 104, L041 Algorithms, Part 1 of 3 Topics Definition of an Algorithm Example: The Euclidean Algorithm Syntax versus Semantics Reading Sections 3.1.
Chapter 10 Language acquisition Language acquisition----refers to the child’s acquisition of his mother tongue, i.e. how the child comes to understand.
The significance of learners’ errors S. P. Corder 2007 년 2 학기 담당교수 : 홍우 평 이중언어커뮤니케 이션.
Child Syntax and Morphology
Algorithms, Part 1 of 3 The First step in the programming process
Algorithms, Part 1 of 3 Topics Definition of an Algorithm
Syntax 1.
Basic Parsing with Context Free Grammars Chapter 13
Theories of Language Development
Algorithms I: An Introduction to Algorithms
William Gregory Sakas City University of New York (CUNY)
Structural relations Carnie 2013, chapter 4 Kofi K. Saah.
Dependency structure and cognition
Algorithms, Part 1 of 3 Topics Definition of an Algorithm
Traditional Grammar VS. Generative Grammar
Linguistic aspects of interlanguage
Algorithms, Part 1 of 3 Topics Definition of an Algorithm
Biointelligence Laboratory, Seoul National University
Presentation transcript:

Introduction to Language Acquisition Theory Janet Dean Fodor St. Petersburg July 2013 Class 2. From computer science (then) to psycholinguistics (now)

Syntax acquisition as parameter setting Syntax acquisition as parameter setting  Like playing “20 questions’. The learner’s task is to detect the correct settings of the finite number of parameters.  Headedness parameter: Are syntactic phrases head-initial (e.g., in VP, the verb precedes its object) or head-final (the verb follows the object)?  Wh-movement parameter: Does a Wh-phrase move to the top of a clause or does it remain in situ?  Parameter values are ‘triggered’ by learner’s encountering a distinctive revealing property of an input sentence.  This Principles-and-Parameters approach has been retained through many subsequent changes in TG theory.  It greatly reduces a learner’s workload of data-processing. It helps address the Poverty of Stimulus problem. 2 

3 Parameter setting as flipping switches  Chomsky never provided a specific implementation of parametric triggering. He often employed a metaphor of setting switches. (Chomsky 1981/1986)  The metaphor suggests that parameter setting is: › Automatic, instantaneous, effortless: no linguistic reasoning is required of the learner. (Unlike hypothesis-formation models.) › Input-guided (no trial-and-error process). › A universal mechanism, but leading reliably to language- specific parameter settings. › Non-interacting parameters: Each can be set separately. › Each has unambiguous triggers recognizable regardless of what else the learner does or doesn’t know about the language. › Deterministic learning: fully accurate, so no revision is ever needed. › Automatic, instantaneous, effortless: no linguistic reasoning is required of the learner. (Unlike hypothesis-formation models.) › Input-guided (no trial-and-error process). › A universal mechanism, but leading reliably to language- specific parameter settings. › Non-interacting parameters: Each can be set separately. › Each has unambiguous triggers recognizable regardless of what else the learner does or doesn’t know about the language. › Deterministic learning: fully accurate, so no revision is ever needed.  A wonderful advance if true – if psychologically feasible!  A wonderful advance, if true – if psychologically feasible.

4 But computational linguists couldn’t implement it (parameters yes; triggering no)  Syntacticians largely embraced this neat picture.  But as a mechanism, triggering was never implemented. Computational linguists deemed it unfeasible. Due to ambiguity and opacity of would-be triggers, in the natural language domain. (Clark, 1989) Examples, next slide   Only the concept of parameterization was retained: Language acquisition is selection of a grammar from a finite set, which is defined by UG (innate principles + innate parametric choices).  The learning process was modeled as a trial-and-error search through the domain of all possible grammars. Applying familiar domain-general learning algorithms from computer science.  No input guidance toward correct grammar. Input serves only as feedback on hypotheses selected partly at random.

5 Why doesn’t instant triggering work? Why doesn’t instant triggering work?  Input ambiguity: E.g. Exceptional Case Marking (Clark 1989) We consider him to be clever. ECM or Infin assigns Acc case? I consider myself to be clever. Long-distance anaphora? We consider him to be clever. ECM or Infin assigns Acc case? I consider myself to be clever. Long-distance anaphora?  Derivational opacity: E.g. Adv P not Verb Subj. Entails -NullSubj. Why?! Because P with no object must be due to obj-topicalization, then topic-drop. +NullTop entails -NS.  Conclusion: It’s impossible or impractical to recognize the parameter-values from the surface sentence.  Learners have to guess. (Counter-argument in Classes 6 & 7.)  Conclusion: It’s impossible or impractical to recognize the parameter-values from the surface sentence.  Learners have to guess. (Counter-argument in Classes 6 & 7.)  Also, classic triggering mis-predicts child data (Yang 2002): children’s grammar changes are gradual; they must be contemplating two or more (many?) grammars simultaneously.

6 Trial-and-error domain search methods: under-powered or over-resourced Trial-and-error domain search methods: under-powered or over-resourced  Genetic algorithm. Clark & Roberts (1993) Test many grammars each on many sentences, rank them, breed them, repeat, repeat. (Over-resourced)  Triggering Learning Algorithm. Gibson & Wexler (1994) Test one grammar at a time, on one sentence. If it fails, change one P at random. (Under-powered; fails often, slow)  Variational Model. Yang (2000)  Next slide Give TLA a memory for success-rate of each parameter value. Test one grammar, but sample the whole domain.  Bayesian Learner. Perfors, Tenenbaum & Regier (2006) Test all grammars on total input sample. Adopt the one with best mix of simplicity & good fit. (Over-resourced)

7 Variational Model’s memory for how well each P-value has performed etc. Head-direction Null subject WH-movement  Test one grammar at a time. If it succeeds, nudge the pointer for each parameter toward the successful P-value. If the grammar fails, nudge the pointers away from those P-values.  Select a grammar to test next, with probab- ility based on the weights of its P-values.

8 Varieties of domain-search, illustrated Varieties of domain-search, illustrated  Think Easter egg hunt. Eggs are the parameter values, to be found. Search domain is the park.  Genetic Algorithm: Send out hordes of searchers, compare notes.  Triggering Learning Algorithm: A lone searcher, following own nose, small steps: “getting warmer”.  Variational Model: Mark findings/failures on a rough map to focus search; occasionally dash to another spot to see what’s there.  Compare these with decoding: First consult the sentence! Read a clue, decipher its meaning, go where it says; the egg is there.

9 Varieties of domain-search, illustrated  GA: Send out hordes of searchers, compare notes. (Vast effort)  TLA: A lone searcher, following own nose, small steps: “getting warmer”. (Slow progress)  VM: Mark findings/failures on a rough map; occasionally dash to another spot to see what’s there. (Still a needle in a haystack) …......

Yang’s VM: the best current search model Yang’s VM: the best current search model  Can learn from every input sentence.  Choice of a grammar to try is based on its track record.  But no decoding, so it extracts little info per sentence: Only can /cannot parse. Not why, or what would help.  Can’t recognize unambiguity.  Non-deterministic. Parameters may swing back & forth between the two values repeatedly.  Inefficiency increases with size of the domain, perhaps exponentially (especially if domain is not ‘smooth’).  Yang’s simulations and ours agree: VM consumes an order of magnitude more input than decoding models. 10     

Is VM plausible as psychology?  VM improves on TLA, achieving more effective search with modest resources. And it avoids getting permanently trapped in a wrong corner of the domain. (‘local minimum’)  But it has some strange un-human-like(?) properties:  Irrelevant parameter values are rewarded / punished, e.g., prep-stranding in a sentence with no preps. Without decoding, VM can’t know which parameters are relevant to the input sentence.  To explore, it tests some grammars that are NOT highly valued at present  The child will often fail to parse a sentence, even if her currently best grammar can parse it! Exploring fights normal language use. 11 

What’s more psychologically realistic? What’s more psychologically realistic?  A crucial aspect of the VM is that even low-valued grammars are occasionally tried out on input sentences.  But is this what children do?  When a toddler hears an utterance, what goes on in her brain? Specifically:  What grammar does she try to process the sentence with?  Surely, she’d apply her currently ‘highest-valued’ grammar? Why would she use one that she believes to be wrong?  A low-valued grammar would often fail to deliver a successful parse of the sentence. When it fails, the child doesn’t (linguistically) understand the sentence – even if it’s one she understood yesterday and it is generated by her current ‘best’ grammar!

CUNY’s alternative: Learning by parsing CUNY’s alternative: Learning by parsing  This is a brief preview. We’ll go into more detail in Class 7.  A child’s aim is to understand what people are saying.  So, just like adults, children try to parse the sentences they hear. ( Assign structure to word string; semantic composition.)  When the child’s grammar licenses an input, her parsing routines function just as in adult sentence comprehension.  When the sentence lies beyond her current grammar, the parsing mechanism can process parts of the sentence but not all. It seeks a way to complete the parse tree. ( Not just yes/no. )  To do so, it draws on the additional parameter-values that UG makes available, seeking one that can solve the problem.  If a parameter-value succeeds in rescuing the parse, that means it’s useful, so it is adopted into the grammar. 13

So a parameter value must be something the parser can use So a parameter value must be something the parser can use  What a parser (adult or child) really needs is a way to connect an incoming word into the tree structure being built. Some linkage of syntactic nodes and branches.  At CUNY we take parameter values to be UG-specified ‘treelets’, that the parser can use. (Not switch-settings.)  A treelet is a sub-structure of larger sentential trees (typically underspecified in some respects).  Example treelet: a PP node immediately dominating a preposition and a nominal trace. Indicates a positive value for the preposition-stranding parameter (Who are you talking with now? vs. *Avec qui parles-tu maintenant?). 14

Children do what adults do: Example Children do what adults do: Example  E.g., Which rock can you jump to from here? has a stranded preposition to, with no overt complement. That becomes evident at the word from.  For an adult English speaker, the parsing mechanism has access to a possible piece of tree structure (a ‘treelet’) which inserts a phonologically null complement to the preposition, and links it to a fronted wh-phrase. See tree diagram   Now consider a child who already knows wh-movement but not yet preposition stranding (maybe not realistic!). The child’s parser would do exactly the same as the adult’s, up to the word from.  The child’s current grammar offers no means of continuing the parse. It has no treelet that fits between to and from. So it must look and see whether UG can provide one. 1515

In English, a preposition may have a null complement. Learners will discover this, as they parse. 16 +null i

Children must reach out to UG Children must reach out to UG  The child’s parser must search for a treelet in the wider pool of candidates made available by UG, to identify one that will fill that gap in the parse tree.  Once found, that treelet would become part of the learner’s grammar, for future use in understanding and producing sentences with stranded prepositions.  Summary: In the treelet model, the learner’s innate parsing mechanism works with the learner’s single currently best grammar hypothesis, and upgrades it on-line just if and where it finds that a new treelet is needed in order to parse an incoming sentence.  A child’s processing of sentences differs from an adults’, only in the need to reach out to UG for new treelets. 1717

Compared with domain search systems Compared with domain search systems  In this way, the specific properties of input sentences provide a word-by-word guide to the adoption of relevant parameter values, in a narrowly channeled process. E.g., What to do if you encounter a sentence containing a prep without an overt object. E.g., What to do if you encounter a sentence containing a prep without an overt object.  This input-guidance gets the maximum benefit from the information the input contains.  It requires no specifically-evolved learning mechanism for language. (But it does need access to UG.)  It makes use of the sentence parsing mechanism, which is needed in any case – and which is generally regarded as being innate, ready to function as soon as the child knows some words. 18

Please read before Friday (Class 3)  The 2-page article “Positive and negative evidence in language acquisition”, by Grimshaw & Pinker. On the availability and utility of negative data. The key questions: Does negative evidence exist? Do language learners use it? Do language learners need to? 19