InputOutput 0 00 1 00 0 10 1 11 AND Network NETWORK CONFIGURED BY TLEARN # weights after 10000 sweeps # WEIGHTS # TO NODE 1 -1.9083807468## bias to 1.

Slides:



Advertisements
Similar presentations
How Children Acquire Language
Advertisements

Psycholinguistic what is psycholinguistic? 1-pyscholinguistic is the study of the cognitive process of language acquisition and use. 2-The scope of psycholinguistic.
Language, Mind, and Brain by Ewa Dabrowska Chapter 9: Syntactic constructions, pt. 1.
1 Language and kids Linguistics lecture #8 November 21, 2006.
Learning linguistic structure with simple recurrent networks February 20, 2013.
PDP: Motivation, basic approach. Cognitive psychology or “How the Mind Works”
The Nature of Learner Language
Language (and Decomposition). Linguistics provides… a highly articulated “computational” (generative) theory of the mental representations of language.
Connectionist Simulation of the Empirical Acquisition of Grammatical Relations – William C. Morris, Jeffrey Elman Connectionist Simulation of the Empirical.
PSY 369: Psycholinguistics Language Acquisition: Morphology.
9.012 Brain and Cognitive Sciences II Part VIII: Intro to Language & Psycholinguistics - Dr. Ted Gibson.
Cognitive Processes PSY 334
COGNITIVE NEUROSCIENCE
Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
Language, Mind, and Brain by Ewa Dabrowska Chapter 2: Language processing: speed and flexibility.
Modeling Reading Development From First Grade Text Michael W. Harm, CMU Mark S. Seidenberg, Wisconsin/Madison.
Cognitive Processes PSY 334 Chapter 13 – Individual Differences in Cognition June 6, 2003.
Language Comprehension Speech Perception Naming Deficits.
1 Human simulations of vocabulary learning Présentation Interface Syntaxe-Psycholinguistique Y-Lan BOUREAU Gillette, Gleitman, Gleitman, Lederer.
Psycholinguistics 12 Language Acquisition. Three variables of language acquisition Environmental Cognitive Innate.
Psych 156A/ Ling 150: Acquisition of Language II Lecture 13 Learning Biases.
Semantic Development Acquisition of words and their meanings
Neural Networks. Functions InputOutput 4, 48 2, 35 1, 910 6, ,
Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.
Development and Disintegration of Conceptual Knowledge: A Parallel-Distributed Processing Approach Jay McClelland Department of Psychology and Center for.
Language PERTEMUAN Communication Psycholinguistics –study of mental processes and structures that underlie our ability to produce and comprehend.
James L. McClelland Stanford University
Introduction Pinker and colleagues (Pinker & Ullman, 2002) have argued that morphologically irregular verbs must be stored as full forms in the mental.
The changing face of face research Vicki Bruce School of Psychology Newcastle University.
1 Visual word recognition rules vs. pattern recognition and memory retrieval Erika Nyhus.
Grammaticality Judgments Do you want to come with?20% I might could do that.38% The pavements are all wet.60% Y’all come back now.38% What if I were Romeo.
Statistical Learning in Infants (and bigger folks)
TEMPLATE DESIGN © Learning Words and Rules Abstract Knowledge of Word Order in Early Sentence Comprehension Yael Gertner.
Chapter 10 - Language 4 Components of Language 1.Phonology Understanding & producing speech sounds Phoneme - smallest sound unit Number of phonemes varies.
Age of acquisition and frequency of occurrence: Implications for experience based models of word processing and sentence parsing Marc Brysbaert.
Adele E. Goldberg. How argument structure constructions are learned.
Statistical Learning in Infants (and bigger folks)
Modelling Language Evolution Lecture 1: Introduction to Learning Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.
Emergence of Semantic Knowledge from Experience Jay McClelland Stanford University.
What is modularity good for? Michael S. C. Thomas, Neil A. Forrester, Fiona M. Richardson
© 2008 The McGraw-Hill Companies, Inc. Chapter 8: Cognition and Language.
The Past Tense Model Psych /719 Feb 13, 2001.
Semantic Processing and Irregularly Inflected Forms Michele Miozzo & Peter Gordon Columbia University Introduction Recent models of lexical representation.
Rapid integration of new schema- consistent information in the Complementary Learning Systems Theory Jay McClelland, Stanford University.
Cognitive Processes PSY 334 Chapter 11 – Language Structure June 2, 2003.
Statistical Learning in Infants (and bigger folks)
What infants bring to language acquisition Limitations of Motherese & First steps in Word Learning.
Development of Expertise. Expertise We are very good (perhaps even expert) at many things: - driving - reading - writing - talking What are some other.
The Emergent Structure of Semantic Knowledge
Chapter 2 The Nature of Learner Language By : Annisa Mustikanthi.
The Emergentist Approach To Language As Embodied in Connectionist Networks James L. McClelland Stanford University.
Method. Input to Learning Two groups of learners each learn one of two new Semi-Artificial Languages. Both Languages: Example sentences: glim lion bee.
Development and Disintegration of Conceptual Knowledge: A Parallel-Distributed Processing Approach James L. McClelland Department of Psychology and Center.
Chapter 11 Language. Some Questions to Consider How do we understand individual words, and how are words combined to create sentences? How can we understand.
A. Baker, J. de Jong, A. Orgassa & F. Weerman Collaborators: VARIFLEX project: Elma Blom & Daniela Polišenská (NWO-research grant : Disentangling.
1 First Language Acquisition, Developmental Language Disorders and Executive Function Anne Baker (ACLC) Michiel van Lambalgen (ILLC)
Chapter 3 Language Acquisition: A Linguistic Treatment Jang, HaYoung Biointelligence Laborotary Seoul National University.
Approaches to Teaching and Learning How people learn languages Session 2.
Neural Networks and Non-Symbolic Computation. Functions InputOutput 4, 48 2, 35 1, 910 6, ,
Child Syntax and Morphology
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.
Biointelligence Laboratory, Seoul National University
The Past Tense Neural Networks and Non-Symbolic Computation
Language, Mind, and Brain by Ewa Dabrowska
Cognitive Processes in SLL and Bilinguals:
Neural Networks.
James L. McClelland SS 100, May 31, 2011
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 8, 2018.
Learning linguistic structure with simple recurrent neural networks
Chapter 3 Interlanguage.
Presentation transcript:

InputOutput AND Network

NETWORK CONFIGURED BY TLEARN # weights after sweeps # WEIGHTS # TO NODE ## bias to ## i1 to ## i2 to OR Network InputOutput

XOR Network

## bias to ## i1 to ## i2 to 1 XOR Network

## bias to ## i1 to ## i2 to 1 XOR Network InputOutput

## bias to ## i1 to ## i2 to ## bias to ## i1 to ## i2 to 2 XOR Network InputOutput

## bias to ## i1 to ## i2 to ## bias to ## i1 to ## i2 to 2 XOR Network InputOutput InputOutput

## bias to ## i1 to ## i2 to ## bias to ## i1 to ## i2 to ## bias to output ## 1 to output ## 2 to output XOR Network InputOutput InputOutput

## bias to ## i1 to ## i2 to ## bias to ## i1 to ## i2 to ## bias to output ## 1 to output ## 2 to output XOR Network InputOutput InputOutput InputOutput

## bias to ## i1 to ## i2 to ## bias to ## i1 to ## i2 to ## bias to output ## 1 to output ## 2 to output XOR Network InputOutput InputOutput InputOutput The mapping from the hidden units to output is an OR network, that never receives a [1 1] input.

The Past Tense and Beyond

Classic Developmental Story Initial mastery of regular and irregular past tense forms Overregularization appears only later (e.g. goed, comed) ‘U-Shaped’ developmental pattern taken as evidence for learning of a morphological rule V + [+past] --> stem + /d/

Rumelhart & McClelland 1986 Model learns to classify regulars and irregulars, based on sound similarity alone. Shows U-shaped developmental profile.

What is really at stake here? Abstraction Operations over variables Learning based on input

What is not at stake here Feedback, negative evidence, etc.

Who has the most at stake here? Those who deny the need for rules/variables in language have the most to lose here …but if they are successful, they bring with them a simple and attractive learning theory, and mechanisms that can readily be grounded at the neural level However, if the advocates of rules/variables succeed here or elsewhere, they face the more difficult challenge at the neuroscientific level

Questions about Lab 2b How did the network perform? How well did the network generalize to novel stems? What was the effect of the frequency manipulation? Does the network need to internalize a Blocking Principle? Does the network explicitly represent a default form?

1. Are regulars different? 2. Do regulars implicate operations over variables? Neuropsychological Dissociations Other Domains of Morphology Beyond Sound Similarity Regulars and Associative Memory

1. Are regulars different? 2. Do regulars implicate operations over variables? Neuropsychological Dissociations Other Domains of Morphology Beyond Sound Similarity Regulars and Associative Memory

(Pinker & Ullman 2002)

Beyond Sound Similarity Zero-derived denominals are regular –Soldiers ringed the city –*Soldiers rang the city –high-sticked, grandstanded, … –*high-stuck, *grandstood, … Productive in adults & children Shows sensitivity to morphological structure [[ stem N ] ø V ]-ed Provides good evidence that sound similarity is not everything But nothing prevents a model from using richer similarity metric –morphological structure (for ringed) –semantic similarity (for low-lifes)

1. Are regulars different? 2. Do regulars implicate operations over variables? Neuropsychological Dissociations Other Domains of Morphology Beyond Sound Similarity Regulars and Associative Memory

Regulars & Associative Memory Regulars are productive, need not be stored Irregulars are not productive, must be stored But are regulars immune to effects of associative memory? –frequency –over-irregularization Pinker & Ullman: –regulars may be stored –but they can also be generated on-the-fly –‘race’ can determine which of the two routes wins –some tasks more likely to show effects of stored regulars

Child vs. Adult Impairments Specific Language Impairment –Early claims that regulars show greater impairment than irregulars are not confirmed Pinker & Ullman 2002b –‘The best explanation is that language-impaired people are indeed impaired with rules, […] but can memorize common regular forms.’ –Regulars show consistent frequency effects in SLI, not in controls. –‘This suggests that children growing up with a grammatical deficit are better at compensating for it via memorization than are adults who acquired their deficit later in life.’

1. Are regulars different? 2. Do regulars implicate operations over variables? Neuropsychological Dissociations Other Domains of Morphology Beyond Sound Similarity Regulars and Associative Memory

Neuropsychological Dissociations Ullman et al –Alzheimer’s disease patients Poor memory retrieval Poor irregulars Good regulars –Parkinson’s disease patients Impaired motor control, good memory Good irregulars Poor regulars Striking correlation involving laterality of effect Marslen-Wilson & Tyler 1997 –Normals past tense primes stem –2 Broca’s Patients irregulars prime stems inhibition for regulars –1 patient with bilateral lesion regulars prime stems no priming for irregulars or semantic associates

Morphological Priming Lexical Decision Task –CAT, TAC, BIR, LGU, DOG –press ‘Yes’ if this is a word Priming –facilitation in decision times when related word precedes target (relative to unrelated control) –e.g., {dog, rug} - cat Marslen-Wilson & Tyler 1997 –Regular {jumped, locked} - jump –Irregular {found, shows} - find –Semantic {swan, hay} - goose –Sound {gravy, sherry} - grave

Neuropsychological Dissociations Bird et al –complain that arguments for selective difficulty with regulars are confounded with the phonological complexity of the word-endings Pinker & Ullman 2002 –weight of evidence still supports dissociation; Bird et al.’s materials contained additional confounds

Brain Imaging Studies Jaeger et al. 1996, Language –PET study of past tense –Task: generate past from stem –Design: blocked conditions –Result: different areas of activation for regulars and irregulars Is this evidence decisive? –task demands very different –difference could show up in network –doesn’t implicate variables Münte et al –ERP study of violations –Task: sentence reading –Design: mixed –Result: regulars: ~LAN irregulars: ~N400 Is this evidence decisive? –allows possibility of comparison with other violations

1. Are regulars different? 2. Do regulars implicate operations over variables? Neuropsychological Dissociations Other Domains of Morphology Beyond Sound Similarity Regulars and Associative Memory

Low-Frequency Defaults German Plurals –die Straßedie Straßen die Fraudie Frauen –der Apfeldie Äpfel die Mutterdie Mütter –das Autodie Autos der Parkdie Parks die Schmidts -s plural low frequency, used for loan-words, denominals, names, etc. Response –frequency is not the critical factor in a system that focuses on similarity –distribution in the similarity space is crucial –similarity space with islands of reliability network can learn islands or network can learn to associate a form with the space between the islands

Similarity Space

Arabic Broken Plural CvCC –nafsnufuus‘soul’ –qidhqidaah‘arrow’ CvvCv(v)C –xaatamxawaatim‘signet ring’ –jaamuusjawaamiis‘buffalo’ Sound Plural –shuway?irshuway?ir-uun‘poet (dim.)’ –kaatibkaatib-uun‘writing (participle)’ –hindhind-aat‘Hind (fem. name)’ –ramadaanramadaan-aat‘Ramadan (month)’

German Plurals (Hahn & Nakisa 2000)

Syntax, Semantics, & Statistics

Starting Small Simulation How well did the network perform? How did it manage to learn?

Generalization Training Items –Input: Output: –Input: Output: –Input: Output: –Input: Output: Test Item –Input: Output ? ? ? ?

Generalization Training Items –Input: Output: –Input: Output: –Input: Output: –Input: Output: Test Item –Input: Output ? ? ? ? (Humans) (Network)

Generalization Training Items –Input: Output: –Input: Output: –Input: Output: –Input: Output: Test Item –Input: Output ? ? ? ? Generalization fails because learning is local (Humans) (Network)

Generalization Training Items –Input: Output: –Input: Output: –Input: Output: –Input: Output: Test Item –Input: Output ? ? ? ? Generalization succeeds because representations are shared (Humans) (Network)

Negative Evidence Standard Doctrine –Language learners do not receive negative evidence –They must therefore learn only from positive examples –This forces the learner to make constrained generalizations –[even with negative evidence, generalizations must be constrained] Common Suggestion –‘Implicit Negative Evidence’; the absence of certain examples in the input is taken to be significant –Who do you think John saw __?Who do you think that John saw __? Who do you think __ saw John? *Who do you think that __ saw John? Challenge: how to classify and store input appropriately, in order to detect appropriate generalizations; large memory needed

Feedback via Prediction Simple Recurrent Network –Prediction task provides feedback –System does not need to explicitly store large amounts of data –Challenge is to appropriately encode the feedback signal –If learning rate is set low, learner is protected against noisy input data

Feedback via Prediction The prediction problem is very relevant –can be viewed as representing many kinds of syntactic relations –typically, many predictions held simultaneously; if there’s an explicit representation of hierarchy and abstraction, then order of completion is easily predicted –challenge is to avoid spurious competition among dependencies,

Elman --> Rohde & Plaut –whatever happens, agreement is not the decisive case Seidenberg on locatives –locatives across languages are highly constrained; generalizations go beyond surface patterns; the Seidenberg/Allen model is given the outline of the solution to the problem –semantics, not statistics, is critical How to use SRN ideas effectively –Structured prediction device encodes hypotheses –How to encode and act upon error signals? –Partially matching features can lead to long-distance feedback –Prediction of alternatives can lead to confirmation or disconfirmation of 1 choice

Infants and Statistical Learning

Saffran, Aslin, & Newport (1996) 8-month old infants –Passive exposure to continuous speech sequence for 2 minutes bidakupadotigolabubidaku… –Test (Experiment #2) bidakubidakubidakubidakubidaku… kupadokupadokupadokupadokupado… –Infants listen longer to unfamiliar sequences –Transitional Probabilities bi da ku pa do ti

Marcus et al. (1999) Training –ABA:ga na gali ti li –ABB:ga na nali ti ti Testing –ABA:wo fe wo –ABB:wo fe fe

Aslin & Newport (in press) While some adjacent statistical regularities can be learned, other types of statistical regularities cannot

We have recently been developing a statistical approach to language acquisition and investigating the abilities of human adults, infants, and nonhuman primates to perform the computations that would be required for acquiring properties of natural languages by such a method. Our studies (initially in collaboration with Jenny Saffran) have shown that human adults and infants are capable of performing many of these computations online and with remarkable speed, during the presentation of controlled speech streams in the laboratory. We have also found that adults and infants can perform similar computations on nonlinguistic materials (e.g., music), and (in collaboration with Marc Hauser) that nonhuman primates can perform the simplest of these computations. However in our recent studies, when tested on more complex computations involving non-adjacent sounds, humans show strong selectivities (they can perform certain computations, but fail at others), corresponding to the patterns which natural languages do and do not exhibit. Primates are not capable of performing some of these more difficult computations. Additional recent studies examine how statistics can be used to form non-statistical generalizations and to regularize irregular structures, and how the computations we have hypothesized for word segmentation extend to acquiring syntactic phrase structure. Overall we feel that this approach may provide an important mechanism for learning certain aspects of language, particularly when combined with an understanding of the ways in which input statistics may be selectively extracted or altered as they are acquired. In addition, the constraints of learners in performing differing types and complexities of computations may provide part of the explanation for which learners can acquire human languages, and why languages have some of the properties they have. Newport & Aslin, Dec. 2003)

A Little Syntax

Gold (1967) Hypothetical classes of languages –#1: {A}, {A, AA}, {A, AA, AAA}, {A, AA, AAA, AAAA} –#2: {A}, {A, AA}, {A, AA, AAA}, {A, AA, AAA, AAAA, …, A ∞ } How could a learner figure out the target language, based on positive only examples (‘text presentation’) –#1: –#2: Under class #2, there’s no way to guarantee convergence

Baker (1979) Alternating Verbs –John gave a cookie to the boy. John gave the boy a cookie. –Mary showed some photos to her family. Mary showed her family some photos. Non-Alternating Verbs –John donated a painting to the museum. *John donated the museum a painting. –Mary displayed her art collection to the visitors. *Mary displayed the visitors her art collection Learnability problem: how to avoid overgeneralization

Seidenberg (1997, Science) Locative Verb Constructions –John poured the water into the cup *John poured the cup with water –*Sue filled the water into the glass Sue filled the glass with water –Bill loaded the apples onto the truck Bill loaded the truck with apples “Connectionist networks are well suited to capturing systems with this character. Importantly, a network configured as a device that learns to perform a task such as mapping from sound to meaning will act as a discovery procedure, determining which kinds of information are relevant. Evidence that such models can encode precisely the right combinations of probabilistic constraints is provided by Allen (42), who implemented a network that learns about verbs and their argument structures from naturalistic input.” (p. 1602)

Seidenberg (Science, 3/14/97) “Research on language has arrived at a particularly interesting point, however, because of important developments outside of the linguistic mainstream that are converging on a different view of the nature of language. These developments represent an important turn of events in the history of ideas about language.” (p. 1599)

Seidenberg (Science, 3/14/97) “A second implication concerns the relevance of poverty-of-the-stimulus arguments to other aspects of language. Verbs and their argument structures are important, but they are language specific rather than universal properties of languages and so must be learned from experience.” (p. 1602)

Allen’s Model Learns associations between (i) specific verbs & argument structures and (ii) semantic representations Feature encoding for verbs, 360 features [eat]: +act, +cause, +consume, etc. [John]: +human, +animate, +male, +automotive, - vehicle

Allen’s Model Learns associations between (i) specific verbs & argument structures and (ii) semantic representations Training set: 1200 ‘utterance types’ taken from caretaker speech in Peter corpus (CHILDES)

Allen’s Model Fine-grained distinction between hit, carry John kicked Mary the ball *John carried Mary the basket [kick]: +cause, +apply-force, +move, +travel, +contact, +hit-with-foot, +strike, +kick, +instantaneous-force, +hit [carry]: +cause, + apply-force, +move, +travel, +contact, +carry, +support, +continuous-force, +accompany

Allen’s Model Fine-grained distinction between hit, carry John kicked Mary the ball *John carried Mary the basket [kick]: +cause, +apply-force, +move, +travel, +contact, +hit-with-foot, +strike, +kick, +instantaneous-force, +hit [carry]: +cause, + apply-force, +move, +travel, +contact, +carry, +support, +continuous-force, +accompany

Allen’s Model Fine-grained distinction between hit, carry John kicked Mary the ball *John carried Mary the basket [kick]: +cause, +apply-force, +move, +travel, +contact, +hit-with-foot, +strike, +kick, +instantaneous-force, +hit [carry]: +cause, + apply-force, +move, +travel, +contact, +carry, +support, +continuous-force, +accompany

Allen’s Model Fine-grained distinction between hit, carry John kicked Mary the ball *John carried Mary the basket [kick]: +cause, +apply-force, +move, +travel, +contact, +hit-with-foot, +strike, +kick, +instantaneous-force, +hit [carry]: +cause, + apply-force, +move, +travel, +contact, +carry, +support, +continuous-force, +accompany

Allen’s Model Fine-grained distinction between hit, carry John kicked Mary the ball *John carried Mary the basket “This behavior shows crucially that the network is not merely sensitive to overall semantic similarity: rather, the network has organized the semantic space such that some features are more important than other.” (p. 5)

Challenges Allen’s results are impressive; the model is interesting in the way that it poses the learning task as a selection process (the linking rules do not emerge from nowhere) Fine-grained distinctions in English ‘Concealed’ distinctions in Korean Reason for universals

Challenges Fine-grained distinctions, e.g. in English pour the water into the glass pour the water the poured water stand the lamp on the floor *stand the lamp *the stood lamp

Challenges ‘Concealed’ distinctions, e.g. in Korean pour the water into the glass *pour the glass with water pile the books onto the shelf *pile the shelf with books *pour-put the glass with water pile-put the shelf with books

Challenges Universals, parametric connections - why should they exist and be stable?

(Pena et al. 2002)