Using Backprop to Understand Apects of Cognitive Development PDP Class Feb 8, 2010.

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

NEURAL NETWORKS Backpropagation Algorithm
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
Learning linguistic structure with simple recurrent networks February 20, 2013.
Learning in Recurrent Networks Psychology 209 February 25, 2013.
Best-First Search: Agendas
Kostas Kontogiannis E&CE
PDP: Motivation, basic approach. Cognitive psychology or “How the Mind Works”
Does the Brain Use Symbols or Distributed Representations? James L. McClelland Department of Psychology and Center for Mind, Brain, and Computation Stanford.
Emergence in Cognitive Science: Semantic Cognition Jay McClelland Stanford University.
The back-propagation training algorithm
Knowing Semantic memory.
Pattern Recognition using Hebbian Learning and Floating-Gates Certain pattern recognition problems have been shown to be easily solved by Artificial neural.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Radial Basis Function Networks
Development and Disintegration of Conceptual Knowledge: A Parallel-Distributed Processing Approach Jay McClelland Department of Psychology and Center for.
Representation, Development and Disintegration of Conceptual Knowledge: A Parallel-Distributed Processing Approach James L. McClelland Department of Psychology.
Emergence of Semantic Structure from Experience Jay McClelland Stanford University.
Integrating New Findings into the Complementary Learning Systems Theory of Memory Jay McClelland, Stanford University.
The PDP Approach to Understanding the Mind and Brain Jay McClelland Stanford University January 21, 2014.
Disintegration of Conceptual Knowledge In Semantic Dementia James L. McClelland Department of Psychology and Center for Mind, Brain, and Computation Stanford.
The Boltzmann Machine Psych 419/719 March 1, 2001.
Back Propagation and Representation in PDP Networks Psychology 209 February 6, 2013.
Contrasting Approaches To Semantic Knowledge Representation and Inference Psychology 209 February 15, 2013.
Emergence of Semantic Knowledge from Experience Jay McClelland Stanford University.
Development, Disintegration, and Neural Basis of Semantic Cognition: A Parallel-Distributed Processing Approach James L. McClelland Department of Psychology.
Emergence of Semantic Structure from Experience Jay McClelland Stanford University.
Similarity and Attribution Contrasting Approaches To Semantic Knowledge Representation and Inference Jay McClelland Stanford University.
Rapid integration of new schema- consistent information in the Complementary Learning Systems Theory Jay McClelland, Stanford University.
Intro. ANN & Fuzzy Systems Lecture 14. MLP (VI): Model Selection.
Semantic Cognition: A Parallel Distributed Processing Approach James L. McClelland Center for the Neural Basis of Cognition and Departments of Psychology.
Cognitive Processes PSY 334 Chapter 5 – Meaning-Based Knowledge Representation.
1 How is knowledge stored? Human knowledge comes in 2 varieties: Concepts Concepts Relations among concepts Relations among concepts So any theory of how.
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Neural Networks - lecture 51 Multi-layer neural networks  Motivation  Choosing the architecture  Functioning. FORWARD algorithm  Neural networks as.
Introduction to Models Lecture 8 February 22, 2005.
The Origins of Knowledge Debate How do people gain knowledge about the world around them? Are we born with some fundamental knowledge about concepts like.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Origins of Cognitive Abilities Jay McClelland Stanford University.
Backpropagation Training
EEE502 Pattern Recognition
The Emergent Structure of Semantic Knowledge
CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
Chapter 6 Neural Network.
Emergent Semantics: Meaning and Metaphor Jay McClelland Department of Psychology and Center for Mind, Brain, and Computation Stanford University.
Semantic Knowledge: Its Nature, its Development, and its Neural Basis James L. McClelland Department of Psychology and Center for Mind, Brain, and Computation.
Organization and Emergence of Semantic Knowledge: A Parallel-Distributed Processing Approach James L. McClelland Department of Psychology and Center for.
Development and Disintegration of Conceptual Knowledge: A Parallel-Distributed Processing Approach James L. McClelland Department of Psychology and Center.
Chapter 9 Knowledge. Some Questions to Consider Why is it difficult to decide if a particular object belongs to a particular category, such as “chair,”
Back Propagation and Representation in PDP Networks
Today’s Lecture Neural networks Training
Neural networks.
Back Propagation and Representation in PDP Networks
Psychology 209 – Winter 2017 January 31, 2017
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.
The Gradient Descent Algorithm
Development and Disintegration of Conceptual Knowledge: A Parallel-Distributed Processing Approach James L. McClelland Department of Psychology and Center.
Does the Brain Use Symbols or Distributed Representations?
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Emergence of Semantic Structure from Experience
Simple learning in connectionist networks
Emergence of Semantics from Experience
Back Propagation and Representation in PDP Networks
Simple learning in connectionist networks
CLS, Rapid Schema Consistent Learning, and Similarity-weighted Interleaved learning Psychology 209 Feb 26, 2019.
The Network Approach: Mind as a Web
Presentation transcript:

Using Backprop to Understand Apects of Cognitive Development PDP Class Feb 8, 2010

Back propagation algorithm Propagate activation forward Propagate “error” backward Calculate ‘weight error derivative’ terms =  r a s Change weights after –Each pattern –A batch of patterns i j k At the output level:  i = (t i -a i )f’(net i ) At other levels:  j = f’(net j )  i  i w ij, etc.

Variants/Embellishments to back propagation We can include weight decay and momentum:  w rs =   p  rp a sp –  w rs +  w rs (prev) An alternative error measure has both conceptual and practical advantages: CE p = -  i [t ip log(a ip ) + (1-t ip )log(1-a ip )] If targets are actually probabilistic, minimizing CE p causes activations to match the probability of the observed target values. This also eliminates the ‘pinned output unit’ problem.

Is backprop biologically plausible? Neurons do net send error signals backward across their weights through a chain of neurons, as far as anyone can tell. But we shouldn’t be too literal minded about the actual biological implementation of the learning rule. Some neurons appear to use error signals, and there are ways to use differences between activation signals to carry error information. (We will explore this in a later lecture.)

Why is back propagation important? Provides a procedure that allows networks to learn weights that can solve any deterministic input-output problem. –Contrary to expectation, it does not get stuck in local minima except in cases where the network is exceptionally tightly constrained. –Allows networks with multiple hidden layers to be trained, although learning tends to proceed slowly (later we will learn about procedures that can fix this). Allows networks to learn how to represent information as well as how to use it. Raises questions about the nature of representations and of what must be specified in order to learn them.

The Time-Course of Cognitive Development Networks trained with back-propagation address several issues in development including –Whether innate knowledge is necessary as a starting point for learning. –Aspects of the time course of development –What causes changes in the pattern of responses children make at different times during development? –What allows a learned to reach a the point of being ready to learn something s/he previously was not ready to learn?

Two Example Models Rumelhart’s semantic learning model –Addresses most of the issues above –Available as the “semnet” script in the bp directory Model of child development in a ‘naïve physics’ task (Piaget’s balance scale task) –Addresses stage transitions and readiness to learn new things –We will not get to this; see readings of interested

Quillian’s (1969) Hierarchical Propositional Model

The Rumelhart (1990) Model

The Training Data: All propositions true of items at the bottom level of the tree, e.g.: Robin can {fly, move, grow}

The Rumelhart Model: Target output for ‘robin can’ input

The Rumelhart Model

ExperienceExperience Early Later Later Still

Inference and Generalization in the PDP Model A semantic representation for a new item can be derived by error propagation from given information, using knowledge already stored in the weights.

Start with a neutral representation on the representation units. Use backprop to adjust the representation to minimize the error.

The result is a representation similar to that of the average bird…

Use the representation to infer what a this new thing can do.

Some Phenomena in Conceptual Development Progressive differentiation of concepts Illusory correlations and U-shaped developmental trajectories Domain- and property-specific constraints on generalization Reorganization of Conceptual Knoweldge

Waves of differentiation reflect sensitivity to patterns of coherent covariation of properties across items. Patterns of coherent covariation are reflected in the principal components of the property covariance matrix. Figure shows attribute loadings on the first three principal components: –1. Plants vs. animals –2. Birds vs. fish –3. Trees vs. flowers Same color = features covary in component Diff color = anti-covarying features What Drives Progressive Differentiation?

Coherent Covariation The tendency of properties of objects to co- occur in clusters. e.g. –Has wings –Can fly –Is light Or –Has roots –Has rigid cell walls –Can grow tall

Coherence Training Patterns No labels are provided Each item and each property occurs with equal frequency Properties Coherent Incoherent Items is can has is can has …

Effects of Coherence on Learning Coherent Properties Incoherent Properties

Effect of Coherence on Representation

Effects of Coherent Variation on Learning in Connectionist Models Attributes that vary together create the acquired concepts that populate the taxonomic hierarchy, and determine which properties are central and which are incidental to a given concept. –Labeling of these concepts or their properties is in no way necessary, but it may contribute additional ‘covarying’ information, and can affect the pattern of differentiation. Arbitrary properties (those that do not co-vary with others) are very difficult to learn. –And it is harder to learn names for concepts that are only differentiated by such arbitrary properties.

Sensitivity to Coherence Requires Convergence A A A

Illusory Correlations Rochel Gelman found that children think that all animals have feet. –Even animals that look like small furry balls and don’t seem to have any feet at all.

A typical property that a particular object lacks e.g., pine has leaves An infrequent, atypical property

Domain Specificity What constraints are required for development and elaboration of domain-specific knowledge? –Are domain specific constraints required? –Or are there general principles that allow for acquisition of conceptual knowledge of all different types?

Differential Importance (Marcario, 1991) 3-4 yr old children see a puppet and are told he likes to eat, or play with, a certain object (e.g., top object at right) –Children then must choose another one that will “be the same kind of thing to eat” or that will be “the same kind of thing to play with”. –In the first case they tend to choose the object with the same color. –In the second case they will tend to choose the object with the same shape.

–Can the knowledge that one kind of property is important for one type of thing while another is important for a different type of thing be learned? –They can in the PDP model, since it is sensitive to domain-specific patterns of coherent covariation.

Adjustments to Training Environment Among the plants: –All trees are large –All flowers are small –Either can be bright or dull Among the animals: –All birds are bright –All fish are dull –Either can be small or large In other words: –Size covaries with properties that differentiate different types of plants –Brightness covaries with properties that differentiate different types of animals

Testing Feature Importance After partial learning, model is shown eight test objects: –Four “Animals”: All have skin One is large, bright; one small, bright; one large, dull, one small, dull. –Four “Plants”: All have roots Same 4 combinations as above Representations are generated by using back-propagation to representation. Representations are then compared to see which ‘animals’ are treated as most similar, and which ‘plants’ are treated as most similar.

The Rumelhart Model

Similarities of Obtained Representations Size is relevant for Plants Brightness is relevant for Animals

Additional Properties of the model The model is sensitive to amount and type of exposure, addressing frequency effects, expertise effects and capturing different types of expertise. The model’s pattern of generalization varies as a function of the type of property as well as the domain. The model can reorganize its knowledge: –It will first learn about superficial appearance properties if these are generally available; later, it can re-organize its knowledge based on coherent covariation among properties that are only occur in specific context.