Connectionist Time and Dynamic Systems Time in One Architecture? Modeling Word Learning at Two Timescales Jessica S. Horst Bob.

Slides:

Advertisements

Similar presentations

Chapter 4 Key Concepts.

Advertisements

Training and Testing Neural Networks 서울대학교 산업공학과 생산정보시스템연구실 이상진.

Poster template by ResearchPosters.co.za When do behavioural interventions work and why? Towards identifying principles for clinical intervention in developmental.

Learning linguistic structure with simple recurrent networks February 20, 2013.

COGNITIVE VIEWS OF LEARNING Information processing is a cognitive theory that examines the way knowledge enters and is stored in and retrieved from memory.

Theories of Second language Acquisition

PDP: Motivation, basic approach. Cognitive psychology or “How the Mind Works”

Machine Learning Neural Networks

TEACHING BY PROVIDING CONCRETENESS, ACTIVITY, AND FAMILIARITY

TEACHING BY PROVIDING CONCRETENESS, ACTIVITY, AND FAMILIARITY

Neural Networks Basic concepts ArchitectureOperation.

A Summary of the Article “Intelligence Without Representation” by Rodney A. Brooks (1987) Presented by Dain Finn.

Knowing Semantic memory.

Connectionist models. Connectionist Models Motivated by Brain rather than Mind –A large number of very simple processing elements –A large number of weighted.

Un Supervised Learning & Self Organizing Maps Learning From Examples

Visual Cognition II Object Perception. Theories of Object Recognition Template matching models Feature matching Models Recognition-by-components Configural.

Reading. Reading Research Processes involved in reading –Orthography (the spelling of words) –Phonology (the sound of words) –Word meaning –Syntax –Higher-level.

Sentence Processing using a Simple Recurrent Network EE 645 Final Project Spring 2003 Dong-Wan Kang 5/14/2003.

Processing and Constraint Satisfaction: Psychological Implications The Interactive-Activation (IA) Model of Word Recognition Psychology /719 January.

Real-Time Dynamics of Language Acquisition in Two-Year-Old Children and Connectionist Models Jessica S. Horst Larissa K. Samuelson.

Modeling Language Acquisition with Neural Networks A preliminary research plan Steve R. Howell.

Information Processing Approach Define cognition and differentiate among the stage, levels-of-processing, parallel distributed processing, and connectionist.

Development and Disintegration of Conceptual Knowledge: A Parallel-Distributed Processing Approach Jay McClelland Department of Psychology and Center for.

Dynamics of learning: A case study Jay McClelland Stanford University.

Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.

James L. McClelland Stanford University

Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.

The changing face of face research Vicki Bruce School of Psychology Newcastle University.

A Model of Object Permanence Psych 419/719 March 6, 2001.

Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.

Chapter 6 Cognitive and Learning Characteristics © Taylor & Francis 2015.

NEURAL NETWORKS FOR DATA MINING

Background The physiology of the cerebral cortex is organized in hierarchical manner. The prefrontal cortex (PFC) constitutes the highest level of the.

Infant Discrimination of Voices: Predictions from the Intersensory Redundancy Hypothesis Lorraine E. Bahrick, Robert Lickliter, Melissa A. Shuman, Laura.

Learning Theories with Technology Learning Theories with Technology By: Jessica Rubinstein.

Bayesian Connections: An Approach to Modeling Aspects of the Reading Process David A. Medler Center for the Neural Basis of Cognition Carnegie Mellon University.

Participants Interference in Motor Learning MEASUREMENTSExperimental ResultsIntroduction  The history of prior action in the human motor system is known.

Shane T. Mueller, Ph.D. Indiana University Klein Associates/ARA Rich Shiffrin Indiana University and Memory, Attention & Perception Lab REM-II: A model.

Three-month-old Infants Recognize Faces in Unimodal Visual but not Bimodal Audiovisual Stimulation Lorraine E. Bahrick 1, Lisa C. Newell 2, Melissa Shuman.

Modelling Language Evolution Lecture 1: Introduction to Learning Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.

Emergence of Semantic Knowledge from Experience Jay McClelland Stanford University.

What is modularity good for? Michael S. C. Thomas, Neil A. Forrester, Fiona M. Richardson

SPEECH PERCEPTION DAY 18 – OCT 9, 2013 Brain & Language LING NSCI Harry Howard Tulane University.

Tonal Space and the Human Mind By P. Janata Presented by Deepak Natarajan.

How Do Systems Change? Behavioral patterns: Variously stable, softly assembled attractor states As system parameters or external boundary conditions change,

Similarity and Attribution Contrasting Approaches To Semantic Knowledge Representation and Inference Jay McClelland Stanford University.

Introduction to Neural Networks and Example Applications in HCI Nick Gentile.

Rapid integration of new schema- consistent information in the Complementary Learning Systems Theory Jay McClelland, Stanford University.

Examining Constraints on Speech Growth in Children with Cochlear Implants J. Bruce Tomblin The University of Iowa.

Working Memory and Learning Underlying Website Structure

What infants bring to language acquisition Limitations of Motherese & First steps in Word Learning.

COSC 460 – Neural Networks Gregory Caza 17 August 2007.

Perseveration following a temporal delay in the Dimensional Change Card Sort. Anthony Steven Dick and Willis F. Overton Temple University Correspondence.

Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22

Exploiting Cognitive Constraints To Improve Machine-Learning Memory Models Michael C. Mozer Department of Computer Science University of Colorado, Boulder.

The Emergent Structure of Semantic Knowledge

Chapter 6 Neural Network.

Emergent Semantics: Meaning and Metaphor Jay McClelland Department of Psychology and Center for Mind, Brain, and Computation Stanford University.

Pattern Recognition. What is Pattern Recognition? Pattern recognition is a sub-topic of machine learning. PR is the science that concerns the description.

NEURONAL NETWORKS AND CONNECTIONIST (PDP) MODELS Thorndike’s “Law of Effect” (1920’s) –Reward strengthens connections for operant response Hebb’s “reverberatory.

Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.

James L. McClelland SS 100, May 31, 2011

Backpropagation in fully recurrent and continuous networks

Emergence of Semantics from Experience

A First Look at Music Composition using LSTM Recurrent Neural Networks

Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 8, 2018.

alia joko 21st Nordic Process Control Workshop

Learning linguistic structure with simple recurrent neural networks

The Network Approach: Mind as a Web

Psycholinguistics Psychology Linguistics Cognitive science

Presentation transcript:

Connectionist Time and Dynamic Systems Time in One Architecture? Modeling Word Learning at Two Timescales Jessica S. Horst Bob McMurray Larissa K. Samuelson Dept. of Psychology University of Iowa

Two Time Scales in Neural Networks Connectionist and dynamical systems accounts: stress change over time complement each other in timescale Dynamic Systems: online processes Connectionist Networks: long-term learning Many domains of development require both timescales: Example: language development requires sensitivity to brief and sequential nature of the input slower developmental processes.

Two Time Scales in Language Acquisition Word learning often attributed to fast mapping - quick link between a novel name and a novel object (e.g., Carey, 1978). But, recent empirical data suggests that fast mapping and word learning may represent two distinct time scales (Horst & Samuelson, April, 2005). - Fast Mapping: quick process emerging in the moment. - Word Learning: gradual process over the course of development We capture both timescales in a recurrent network….

Activation feed from input layers to decision layers. Decision units compete via inhibition. Activation feeds back to input layers. Cycle continues until system settles.c Initial State (Before Learning) Auditory Inputs Visual Inputs Decision Units (Hidden) Layer The Architecture (McMurray & Spivey, 2000) Unsupervised Hebbian learning occurs on every cycle.

Online decision dynamics reflect auditory and visual competitors.

The Model End State Post Learning Intermediate State During Learning 15 Auditory & 15 Visual units 90 Decision units Names presented singly with a variable number of objects Name-Decision & Object-Decision associations strengthened via learning After 4000 training trials network forms localist representations Learns name-object links and to ignore visual competitors

Auditory Input Decision Units Connection Strength

Fast: Moment by Moment Online information integration and constraint satisfaction (e.g., McClelland & Elman, 1986, Dell, 1981) Reaches a pattern of stable activation through input based on auditory and visual inputs and stored knowledge (weights) Model makes correct name-object links based on the latest input Slow: Over the Long-Term Unsupervised Hebbian Learning Associates words with visual targets Learns to ignore visual competitors Two Time Scales

The two time scales are not independent Long-term learning depends critically on the dynamics of the fast time scales Competition between decision units ensures pseudo- localist representations—critical for Hebbian learning (e.g. Rumelhart & Zipser, 1986) Learning occurs on each cycle -Influences processing cycle-by-cycle & trial-by-trial Accumulated learning across trials leads to learning on long-term time scale (i.e., word learning) Dependent Time Scales

Empirical Results

24-month-old children Saw 2 familiar & 1 novel objects Asked to get familiar and novel objects (e.g., “get the cow!” or “get the yok!”) Fast Time Scale Cow (familiar) Block (familiar) Yok (novel) Children were excellent at fast mapping (finding the referent of novel and familiar words in the moment). ***

Slow Time Scale After a 5-minute delay, children were asked to pick a newly fast- mapped name (e.g., “get the yok!”) Yok (target) Fode (named foil) unnamed foil (prev. seen) Children unable to retain mappings after a 5-minute delay ***

Initial findings replicated with simpler tasks: effect of number of names or trials? Children’s difficulty in retaining newly fast-mapped names is not related to the number of names or trials Replication Fast MappingRetention 9/12 **4/9 n.s. Fast MappingRetention 7/12 *4/7 n.s. * Binomial, p <.05, ** Binomial, p <.01 Replication #1 (N = 12) Replication #2 (N = 12) 1 Novel Name 8 Familiar Names 7 Preference Trials 1 Novel Name 2 Familiar Names

Simulations

20 networks initialized with random weights 15 word lexicon (names & objects): 5 familiar words 5 novel words 5 held out Trained on 5 familiar items for 5000 epochs Items presented in random order Run in the Fast Mapping Experiment: 10 fast mapping trials (5 familiar, 5 novel) 5 retention trials Learning was not turned off during experiment.

How The Model Behaves Fast Time Scale: Model succeeded on both types of fast-mapping trials Model behavior patterned with empirical results

Slow Time Scale: The model fails to “retain” the newly learned words after a “delay” Chance

How The Model “Thinks” Analyses of weight matrices revealed that relatively little learning occurred during the test phase. End Familiar Words Familiar Words Novel Words Control Words After Learning After Test Squared Deviations Change (RMS) in portions of weight matrix Familiar WordsNovel WordsControl Words After Test Squared Deviations Temporal dynamics of processing

Prior to Experiment After Experiment Connection Strength

Two time scales captured in a single architecture: –Fast, online: fast mapping –Slow, long-term: word learning The model replicated the empirical findings: –Excellent word learning and fast mapping –Poor “retention” Has sufficient knowledge to select the referent at a given moment in time, given auditory and visual input and stored knowledge (weights). But not enough to subsequently “know” the word. Conclusions

In-the-moment learning: –Subtly biases behavior –Combined with activation dynamics, yields correct response. –Does not provide robust, context-independent word knowledge (in the short term) Continued training on fast-mapped words (i.e., 5000 epochs) makes them familiar words. Accumulation of this learning provides robust context- independent word knowledge over development. Conclusions

Take-Home Messages 1) A fast-mapped word is not a known word… …but a known word is known, because it has been fast-mapped many, many times. 2) Understanding development requires models that integrate both short-term dynamic processes and long-term learning.

Carey, S. (1978). The child as word learner. In M. Halle, J. Bresnan & A. Miller (Eds.), Linguistic Theory and Psychological Reality (pp ). Cambridge, MA: MIT Press. Dell, Gary S. (1986). A spreading-activation theory of retrieval in sentence production. Psychological Review, 93(3) Horst, J.S. & Samuelson, L.K. (2005, April). Slow Down: Understanding the Time Course Behind Fast Mapping. Poster session presented at the 2005 Biennial Meeting of the Society for Research in Child Development, Atlanta, GA. McClelland, J. & Elman, J. (1986). The TRACE Model of Speech Perception, Cognitive Psychology, 18(1), McMurray, B., & Spivey, M. (2000). The Categorical Perception of Consonants: The Interaction of Learning and Processing, The Proceedings of the Chicago Linguistics Society, 34(2), Rumelhart, D. & Zipser, D. (1986). Feature Discovery By Competitive Learning. In Rumelhart, D., & McClelland, J. (Eds) Parallel Distributed Processing: Explorations in the Microstructure of Cognition, 1, Cambridge, MA: MIT Press. References Acknowledgements The authors would like to thank Joseph Toscano for programming assistance and support. This work was supported by NICHD Grant R01-HD to LKS.