The Past Tense Neural Networks and Non-Symbolic Computation

Slides:



Advertisements
Similar presentations
ADULT LANGUAGE EVIDENCE BASED PRACTICE GROUP 2008 Extravaganza ADULT LANGUAGE EVIDENCE BASED PRACTICE GROUP Anika Roseby and Kate Schuj Group Co- Leaders.
Advertisements

How Children Acquire Language
Psycholinguistic what is psycholinguistic? 1-pyscholinguistic is the study of the cognitive process of language acquisition and use. 2-The scope of psycholinguistic.
Shallow Processing Eva M. Fernández Queens College & Graduate Center City University of New York.
Cognitive Neuroscience of Language 1. Premise 1: Constituent Cognitive Processes Phonological analysis Syntactic analysis Semantic analysis Premise 2:
1 Language and kids Linguistics lecture #8 November 21, 2006.
Language (and Decomposition). Linguistics provides… a highly articulated “computational” (generative) theory of the mental representations of language.
Computational Analysis of Motor Learning. Three paradigms Force field adaptation Visuomotor transformations Sequence learning Does one term (motor learning)
The Timecourse of Morphological Processing: Base and surface frequency effects in speed-accuracy tradeoff designs Jennifer Vannest University of Michigan.
Post-test review session Tuesday Nov in TH241.
PSY 369: Psycholinguistics Language Acquisition: Morphology.
Non-Word Repetition Challenges to Language Acquisition: Bilingualism and Language Impairment Dr. Sharon Armon-Lotem Bar Ilan University.
Knowing Semantic memory.
Cognitive Processes PSY 334
Reading. Reading Research Processes involved in reading –Orthography (the spelling of words) –Phonology (the sound of words) –Word meaning –Syntax –Higher-level.
Cognitive Processes PSY 334 Chapter 13 – Individual Differences in Cognition June 6, 2003.
Semantic Development Acquisition of words and their meanings
Rules or Connections in Past Tense Inflections Psychology 209 February 4, 2013.
1 Language disorders We can learn a lot by looking at system failure –Which parts are connected to which Examine the relation between listening/speaking.
Neural Networks. Functions InputOutput 4, 48 2, 35 1, 910 6, ,
Sebastián-Gallés, N. & Bosch, L. (2009) Developmental shift in the discrimination of vowel contrasts in bilingual infants: is the distributional account.
Experimental study of morphological priming: evidence from Russian verbal inflection Tatiana Svistunova Elizaveta Gazeeva Tatiana Chernigovskaya St. Petersburg.
General Knowledge Dr. Claudia J. Stanny EXP 4507 Memory & Cognition Spring 2009.
James L. McClelland Stanford University
Introduction Pinker and colleagues (Pinker & Ullman, 2002) have argued that morphologically irregular verbs must be stored as full forms in the mental.
InputOutput AND Network NETWORK CONFIGURED BY TLEARN # weights after sweeps # WEIGHTS # TO NODE ## bias to 1.
The changing face of face research Vicki Bruce School of Psychology Newcastle University.
1 Visual word recognition rules vs. pattern recognition and memory retrieval Erika Nyhus.
Emergence of Semantic Knowledge from Experience Jay McClelland Stanford University.
What is modularity good for? Michael S. C. Thomas, Neil A. Forrester, Fiona M. Richardson
The Past Tense Model Psych /719 Feb 13, 2001.
Language, Mind, and Brain by Ewa Dabrowska Chapter 8: On rules and regularity, pt. 2.
Semantic Processing and Irregularly Inflected Forms Michele Miozzo & Peter Gordon Columbia University Introduction Recent models of lexical representation.
Recent Findings in the Neurobiology & Neuropsychology of Reading Processes -Part D- A. Maerlender, Ph.D. Clinical School Services & Learning Disorders.
Lexical and morphosyntactic minimal pairs. Evidence for different processing Luca Cilibrasi, Vesna Stojanovik, Patricia Riddell, School of Psychology,
COGNITIVE MORPHOLOGY Laura Westmaas November 24, 2009.
The Emergent Structure of Semantic Knowledge
Connectionist Modelling Summer School Lecture Two.
Development and Disintegration of Conceptual Knowledge: A Parallel-Distributed Processing Approach James L. McClelland Department of Psychology and Center.
COGS 5111 Computational Cognitive Modelling COGS 511-Lecture 6 Computational Cognitive Modelling in Studying Inflectional Morphology.
Alternative Approaches to the Role of Previously Known Languages Avoidance: when speaking or writing a second/foreign language, a speaker will often try.
A. Baker, J. de Jong, A. Orgassa & F. Weerman Collaborators: VARIFLEX project: Elma Blom & Daniela Polišenská (NWO-research grant : Disentangling.
VISUAL WORD RECOGNITION. What is Word Recognition? Features, letters & word interactions Interactive Activation Model Lexical and Sublexical Approach.
Approaches to Teaching and Learning How people learn languages Session 2.
Neural Networks and Non-Symbolic Computation. Functions InputOutput 4, 48 2, 35 1, 910 6, ,
Constraints on definite article alternation in speech production: To “thee” or not to “thee”? By M. GARETH GASKELL, HELEN COX, KATHERINE FOLEY, HELEN GRIEVE,
Chapter 9 Knowledge. Some Questions to Consider Why is it difficult to decide if a particular object belongs to a particular category, such as “chair,”
Zatorre paper Presented by MaryKate Chester
Child Syntax and Morphology
Psychology 209 – Winter 2017 January 31, 2017
PSYC 206 Lifespan Development Bilge Yagmurlu.
PSYC 206 Lifespan Development Bilge Yagmurlu.
Verbal inflection: why is it vulnerable in SLI?
Language, Mind, and Brain by Ewa Dabrowska
Cognitive Processes in SLL and Bilinguals:
2nd Language Learning Chapter 2 Lecture 4.
Evaluating the Procedural Deficit Hypothesis in Preschool Children
James L. McClelland SS 100, May 31, 2011
Language, Mind, and Brain by Ewa Dabrowska
© 2016 by W. W. Norton & Company Recognizing Objects Chapter 4 Lecture Outline.
Emergence of Semantics from Experience
Psych 156A/ Ling 150: Acquisition of Language II
Intact Memory for Irrelevant Information Impairs Perception in Amnesia
Noriko Hoshino Department of Psychology
2008 Extravaganza ADULT LANGUAGE EVIDENCE BASED PRACTICE GROUP
Intact Memory for Irrelevant Information Impairs Perception in Amnesia
How is knowledge stored?
How does language develop?
A cognitive perspective on cross language influence
CS249: Neural Language Model
Presentation transcript:

The Past Tense Neural Networks and Non-Symbolic Computation

Abstraction (again!) Powerful, but costly How much is needed in human language? Model system: English past tense

Classic Developmental Story Initial mastery of regular and irregular past tense forms Overregularization appears only later (e.g. goed, comed) ‘U-Shaped’ developmental pattern taken as evidence for learning of a morphological rule V + [+past] --> stem + /d/

Rumelhart & McClelland 1986 Model learns to classify regulars and irregulars, based on sound similarity alone. Shows U-shaped developmental profile.

What is really at stake here? Abstraction Operations over variables Symbol manipulation Algebraic computation Learning based on input How do learners generalize beyond input? y = 2x

Gary Marcus

Functions Input Output 4, 4 8 2, 3 5 1, 9 10 6, 7 13 341, 257 598

Functions Input Output rock rock sing sing alqz alqz dark dark lamb lamb

Functions Input Output 0 0 0 1 0 0 0 1 0 1 1 1

Functions Input Output look looked rake raked sing sang go went want wanted

Functions Input Output John left 1 Wallace fed Gromit 1 Fed Wallace Gromit 0 Who do you like Mary and? 0

Learning Functions Learners are shown examples of what the function generates, and have to figure out what the function is. Think of language/grammar as a very big function (or set of functions). Learning task is similar – learner is presented with examples of what the function generates, and has to figure out what the system is. Main question in language acquisition: what does the learner need to know in order to successfully figure out what this function is? Questions about Neural Networks How can a network represent a function? How can the network discover what this function is?

What is not at stake here Feedback, negative evidence, etc.

Who has the most at stake here? Those who deny the need for rules/variables in language have the most to lose here …if the English past tense is hard, just wait until you get to the rest of natural language! …but if they are successful, they bring with them a simple and attractive learning theory, and mechanisms that can readily be grounded at the neural level However, if the advocates of rules/variables succeed here or elsewhere, they face the more difficult challenge at the neuroscientific level

Pinker Ullman

Beyond Sound Similarity Regulars and Associative Memory 1. Are regulars different? 2. Do regulars implicate operations over variables? Neuropsychological Dissociations Other Domains of Morphology

Beyond Sound Similarity Regulars and Associative Memory 1. Are regulars different? 2. Do regulars implicate operations over variables? Neuropsychological Dissociations Other Domains of Morphology

Beyond Sound Similarity Zero-derived denominals are regular Soldiers ringed the city *Soldiers rang the city high-sticked, grandstanded, … *high-stuck, *grandstood, … Productive in adults & children Shows sensitivity to morphological structure [[ stem N] ø V]-ed

(Pinker & Ullman 2002)

Beyond Sound Similarity Zero-derived denominals are regular Soldiers ringed the city *Soldiers rang the city high-sticked, grandstanded, … *high-stuck, *grandstood, … Productive in adults & children Shows sensitivity to morphological structure [[ stem N] ø V]-ed Provides good evidence that sound similarity is not everything But nothing prevents a model from using richer similarity metric morphological structure (for ringed) semantic similarity (for low-lifes)

Beyond Sound Similarity Regulars and Associative Memory 1. Are regulars different? 2. Do regulars implicate operations over variables? Neuropsychological Dissociations Other Domains of Morphology

Two types of arguments Storage of regulars Default forms

Regulars & Associative Memory Regulars are productive, need not be stored Irregulars are not productive, must be stored But are regulars immune to effects of associative memory? frequency over-irregularization Pinker & Ullman: regulars may be stored but they can also be generated on-the-fly ‘race’ can determine which of the two routes wins some tasks more likely to show effects of stored regulars

Base frequencies matched

singular freq. matched base freq. matched

English Singular frequency matched

Child vs. Adult Impairments Specific Language Impairment Early claims that regulars show greater impairment than irregulars are not confirmed Pinker & Ullman 2002b ‘The best explanation is that language-impaired people are indeed impaired with rules, […] but can memorize common regular forms.’ Regulars show consistent frequency effects in SLI, not in controls. ‘This suggests that children growing up with a grammatical deficit are better at compensating for it via memorization than are adults who acquired their deficit later in life.’

(Clahsen, 1999)

Low-Frequency Defaults German Plurals die Straße die Straßen die Frau die Frauen der Apfel die Äpfel die Mutter die Mütter das Auto die Autos der Park die Parks die Schmidts -s plural low frequency, used for loan-words, denominals, names, etc. Response frequency is not the critical factor in a system that focuses on similarity distribution in the similarity space is crucial similarity space with islands of reliability network can learn islands or network can learn to associate a form with the space between the islands

Beyond Sound Similarity Regulars and Associative Memory 1. Are regulars different? 2. Do regulars implicate operations over variables? Neuropsychological Dissociations Other Domains of Morphology

Neuropsychological Dissociations Ullman et al. 1997 Alzheimer’s disease patients Poor memory retrieval Poor irregulars Good regulars Parkinson’s disease patients Impaired motor control, good memory Good irregulars Poor regulars Striking correlation involving laterality of effect Marslen-Wilson & Tyler 1997 Normals past tense primes stem 2 Broca’s Patients irregulars prime stems inhibition for regulars 1 patient with bilateral lesion regulars prime stems no priming for irregulars or semantic associates

Alzheimer’s Disease

Parkinson’s Disease

Morphological Priming Lexical Decision Task CAT, TAC, BIR, LGU, DOG press ‘Yes’ if this is a word Priming facilitation in decision times when related word precedes target (relative to unrelated control) e.g., {dog, rug} - cat Marslen-Wilson & Tyler 1997 Regular {jumped, locked} - jump Irregular {found, shows} - find Semantic {swan, hay} - goose Sound {gravy, sherry} - grave

Neuropsychological Dissociations Bird et al. 2003 complain that arguments for selective difficulty with regulars are confounded with the phonological complexity of the word-endings Pinker & Ullman 2002 weight of evidence still supports dissociation; Bird et al.’s materials contained additional confounds

Brain Imaging Studies Münte et al. 1997 Is this evidence decisive? Jaeger et al. 1996, Language PET study of past tense Task: generate past from stem Design: blocked conditions Result: different areas of activation for regulars and irregulars Is this evidence decisive? task demands very different difference could show up in network doesn’t implicate variables Münte et al. 1997 ERP study of violations Task: sentence reading Design: mixed Result: regulars: ~LAN irregulars: ~N400 Is this evidence decisive? allows possibility of comparison with other violations

Regular Irregular Nonce (Jaeger et al. 1996)

Beyond Sound Similarity Regulars and Associative Memory 1. Are regulars different? 2. Do regulars implicate operations over variables? Neuropsychological Dissociations Other Domains of Morphology

Abstraction Phonological categories, e.g., /b/ Treating different sounds as equivalent Failure to discriminate members of the same category Treating minimally different words as the same Efficient memory encoding Morphological concatenation, e.g., V + ed Productivity: generalization to novel words, novel sounds Frequency-insensitivity in memory encoding Association with other aspects of ‘procedural memory’

Gary Marcus

Generalization Training Items Test Item Input: 1 0 1 0 Output: 1 0 1 0

Generalization Training Items Test Item Input: 1 0 1 0 Output: 1 0 1 0 1 1 1 1 (Humans) 1 1 1 0 (Network)

Generalization Training Items Test Item Input: 1 0 1 0 Output: 1 0 1 0 Input: 0 1 0 0 Output: 0 1 0 0 Input: 1 1 1 0 Output: 1 1 1 0 Input: 0 0 0 0 Output: 0 0 0 0 Test Item Input: 1 1 1 1 Output ? ? ? ? Generalization fails because learning is local 1 1 1 1 (Humans) 1 1 1 0 (Network)

Generalization Training Items Test Item Input: 1 0 1 0 Output: 1 0 1 0 Input: 0 1 0 0 Output: 0 1 0 0 Input: 1 1 1 0 Output: 1 1 1 0 Input: 0 0 0 0 Output: 0 0 0 0 Test Item Input: 1 1 1 1 Output ? ? ? ? Generalization succeeds because representations are shared 1 1 1 1 (Humans) 1 1 1 1 (Network)

Now another example…

Shared Representation Copying 1: Copying 2: “The key to the representation of variables is whether all inputs in a class are represented by a single node.”

Generalization “In each domain in which there is generalization, it is an empirical question whether the generalization is restricted to items that closely resemble training items or whether the generalization can be freely extended to all novel items within some class.”

How far can a model generalize to novel forms? All novel forms that it can represent Only some of the novel forms that it can represent Velar fricative [x], e.g., Bach What would lead an English speaker to generate the correct plural for Bach?

Hebrew Word Formation Roots Word patterns lmd learning dbr talking CiCeC limed ‘he learned’ CiCeC diber ‘he talked’ CaCaC lamad ‘he studied’ CiCuC limud ‘study’ hitCaCeC hitlamed ‘he taught himself’

English phonemes absent from Hebrew j (as in jeep) ch (as in chair) th (as in thick) <-- features absent from Hebrew w (as in wide) Do speakers generalize the Obligatory Contour Principle (OCP) constraint effects? XXY < YXX jjr < rjj

Root position vs. word position *jjr jajartem hijtajartem hiCtaCaCtem

Ratings derived from rankings for word-triples 1 = best, 3 = worst, scores subtracted from 4

What have we learned …? Storage of rule-generated word forms + decomposition Anomaly detection insensitive to stem frequency Ongoing debates over criteria for establishing ‘defaults’ Neuropsychological dissociations Generalization beyond trained items Does evidence from morphological processing implicate a qualitative distinction between memorized forms and rules (operations involving variables)?