An Unsupervised Connectionist Model of Rule Emergence in Category Learning Rosemary Cowell & Robert French LEAD-CNRS, Dijon, France EC FP6 NEST Grant.

Slides:



Advertisements
Similar presentations
© Negnevitsky, Pearson Education, Introduction Introduction Hebbian learning Hebbian learning Generalised Hebbian learning algorithm Generalised.
Advertisements

Learning School of Computing, University of Leeds, UK AI23 – 2004/05 – demo 2.
Introduction to Artificial Neural Networks
2806 Neural Computation Self-Organizing Maps Lecture Ari Visa.
Perceptual learning and decision making: an integrated neuro-physiological, behavioral and computational approach mario pannunzi Barcelona, 26 XI 2009.
Un Supervised Learning & Self Organizing Maps. Un Supervised Competitive Learning In Hebbian networks, all neurons can fire at the same time Competitive.
Neural Networks Chapter 9 Joost N. Kok Universiteit Leiden.
Self Organization: Competitive Learning
5/16/2015Intelligent Systems and Soft Computing1 Introduction Introduction Hebbian learning Hebbian learning Generalised Hebbian learning algorithm Generalised.
Kohonen Self Organising Maps Michael J. Watts
Unsupervised Learning with Artificial Neural Networks The ANN is given a set of patterns, P, from space, S, but little/no information about their classification,
Artificial neural networks:
Competitive learning College voor cursus Connectionistische modellen M Meeter.
Unsupervised Networks Closely related to clustering Do not require target outputs for each input vector in the training data Inputs are connected to a.
Artificial Neural Networks - Introduction -
Artificial Neural Networks - Introduction -
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
X0 xn w0 wn o Threshold units SOM.
Slides are based on Negnevitsky, Pearson Education, Lecture 8 Artificial neural networks: Unsupervised learning n Introduction n Hebbian learning.
Knowing Semantic memory.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Un Supervised Learning & Self Organizing Maps Learning From Examples
November 9, 2010Neural Networks Lecture 16: Counterpropagation 1 Unsupervised Learning So far, we have only looked at supervised learning, in which an.
Time Organized Maps – Learning cortical topography from spatiotemporal stimuli “ Learning cortical topography from spatiotemporal stimuli ”, J. Wiemer,
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Un Supervised Learning & Self Organizing Maps Learning From Examples
Lecture 09 Clustering-based Learning
Radial Basis Function (RBF) Networks
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Self Organized Map (SOM)
CZ5225: Modeling and Simulation in Biology Lecture 5: Clustering Analysis for Microarray Data III Prof. Chen Yu Zong Tel:
Neural Key Exchange Presented by: Jessica Lowell 10 December 2009 CS 6750.
Artificial Neural Networks Dr. Abdul Basit Siddiqui Assistant Professor FURC.
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
Artificial Neural Network Unsupervised Learning
Hebbian Coincidence Learning
An Instructable Connectionist/Control Architecture: Using Rule-Based Instructions to Accomplish Connectionist Learning in a Human Time Scale Presented.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Machine Learning Neural Networks (3). Understanding Supervised and Unsupervised Learning.
UNSUPERVISED LEARNING NETWORKS
CS 478 – Tools for Machine Learning and Data Mining Perceptron.
Neural Network Basics Anns are analytical systems that address problems whose solutions have not been explicitly formulated Structure in which multiple.
381 Self Organization Map Learning without Examples.
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Over-Trained Network Node Removal and Neurotransmitter-Inspired Artificial Neural Networks By: Kyle Wray.
Unsupervised Learning Networks 主講人 : 虞台文. Content Introduction Important Unsupervised Learning NNs – Hamming Networks – Kohonen’s Self-Organizing Feature.
Semiconductors, BP&A Planning, DREAM PLAN IDEA IMPLEMENTATION.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
COSC 4426 AJ Boulay Julia Johnson Artificial Neural Networks: Introduction to Soft Computing (Textbook)
Artificial Neural Networks (ANN). Artificial Neural Networks First proposed in 1940s as an attempt to simulate the human brain’s cognitive learning processes.
Supervised Learning – Network is presented with the input and the desired output. – Uses a set of inputs for which the desired outputs results / classes.
J. Kubalík, Gerstner Laboratory for Intelligent Decision Making and Control Artificial Neural Networks II - Outline Cascade Nets and Cascade-Correlation.
Chapter 9 Knowledge. Some Questions to Consider Why is it difficult to decide if a particular object belongs to a particular category, such as “chair,”
Machine Learning 12. Local Models.
Chapter 5 Unsupervised learning
Self-Organizing Network Model (SOM) Session 11
Deep Learning Amin Sobhani.
Fall 2004 Perceptron CS478 - Machine Learning.
Neural Networks.
ECE 471/571 - Lecture 15 Hopfield Network 03/29/17.
Unsupervised Learning Networks
Other Applications of Energy Minimzation
Creating fuzzy rules from numerical data using a neural network
Unsupervised Learning and Neural Networks
Counter propagation network (CPN) (§ 5.3)
ECE 471/571 - Lecture 19 Hopfield Network.
Artificial Neural Networks
Unsupervised Networks Closely related to clustering
Patterson: Chap 1 A Review of Machine Learning
Presentation transcript:

An Unsupervised Connectionist Model of Rule Emergence in Category Learning Rosemary Cowell & Robert French LEAD-CNRS, Dijon, France EC FP6 NEST Grant

Key idea driving our research A rule for categorization has emerged when the observer disregards a significant subset of an object’s features and focuses only on a reduced subset of its features in order to determine the object’s category.

Rule No. of features attended to “Eureka moment” “No Eureka moment”

Goal: To develop an unsupervised learning system from which simple rules emerge. Young infants do not receive “rule instruction” for category learning. Animals have even less rule instruction than human infants. We are not claiming that ALL rule learning occurs in this manner, but rather that some, especially for young infants, does.

w 11 w 12 w 31 w 21 w 22 w 32 w 13 w 23 w 33 Kohonen Network Category ACategory BCategory C (w 11, w 21, w 31 ) ?Is (f 1, f 2, f 3 ) closest to f1f1 f2f2 f3f3 New Object

w 11 w 12 w 31 w 21 w 22 w 32 w 13 w 23 w 33 Kohonen Network Category ACategory BCategory C f1f1 f2f2 f3f3 New Object (f 1, f 2, f 3 ) is closest to (w 11, w 21, w 31 ) or (w 12, w 22, w 32 ) ?

w 11 w 12 w 31 w 21 w 22 w 32 w 13 w 23 w 33 Kohonen Network Category ACategory BCategory C f1f1 f2f2 f3f3 New Object Is (f 1, f 2, f 3 ) closest to (w 11, w 21, w 31 ) or (w 12, w 22, w 32 ) or (w 13, w 23, w 33 ) ?

w 11 w 12 w 31 w 21 w 22 w 32 w 13 w 23 w 33 Kohonen Network Category ACategory BCategory C f1f1 f2f2 f3f3 New Object Is (f 1, f 2, f 3 ) closest to (w 11, w 21, w 31 ) or (w 12, w 22, w 32 ) or (w 13, w 23, w 33 ) ? If this is the winner, we modify (w 13, w 23, w 33 ) to be a little closer to (f 1, f 2, f 3 ) than before.

No. of calories consumed per day Life expectancy Weight vectors (w 11, w 21, w 31 ) (w 12, w 22, w 32 ) (w 13, w 23, w 33 )

No. of calories consumed per day Life expectancy

No. of calories consumed per day Life expectancy

No. of calories consumed per day Life expectancy

No. of calories consumed per day Life expectancy

No. of calories consumed per day Life expectancy

No. of calories consumed per day Life expectancy

No. of calories consumed per day Life expectancy

No. of calories consumed per day Life expectancy

No. of calories consumed per day Life expectancy

No. of calories consumed per day Life expectancy

No. of calories consumed per day Life expectancy

No. of calories consumed per day Life expectancy

No. of calories consumed per day Life expectancy

No. of calories consumed per day Life expectancy

No. of calories consumed per day Life expectancy The new weight vectors are now the centroids of each category.

Let’s see how we could extract rules from this kind of system.

A first idea Kohonen Network Competitive learning network A copy of the Kohonen network watches…. …the weights of the original Kohonen network.

w 11 w 12 w 13 w 21 w 22 w 32 w 13 w 23 w 33 Category-Determining Features Category ACategory BCategory C f1f1 f2f2 f3f3 New Object A category-determining feature is one that is sufficient to categorize an object, e.g., if f 3 present, then the object is in Category C and it will not be in Categories A or B. In other words, if the object is in Category C, f 3 will match w 33, but will not match w 31 or w 32.

w 11 w 12 w 13 w 21 w 22 w 32 w 13 w 23 w 33 Category-Irrelevant Features Category ACategory BCategory C f1f1 f2f2 f3f3 New Object A category-irrelevant feature is one that is not sufficient to categorize an object because it is shared by objects in two or more categories. e.g., if f 3 present, then the object may in Category A or C. In other words, f 3 matches both w 31 and w 33.

A concrete example: A Cat-Bird-Bat categorizer Bird Cat Bat eyes wings beak Input stimulus Category Nodes

Bird Cat Bat eyes wings beak A category-determining feature is one that is sufficient to categorize an object, e.g., if ‘beak’, then ‘bird’ and not ‘bat’ or ‘cat’. Input stimulus Category Nodes

Bird Cat Bat eyes wings beak A category-irrelevant feature is one that is not sufficient to categorize an object because it is shared by two or more categories, e.g., if ‘wings’, then ‘bird’ or ‘bat’. Input stimulus Category Nodes

Bird Cat Bat beak Bird Cat Bat wings Bird Cat Bat eyes How do we isolate the category-determining features from category-irrelevant features?

One answer: competition between the weights in a separate network – the Rule Network -- that is a copy of the Kohonen network.

The Rule Network The Rule Network is a separate competitive network with a weight configuration that matches that of the Kohonen network. It “watches” the Kohonen Network to find rule- determining features. How?

Bird Cat Bat beak Bird Cat Bat wings Bird Cat Bat eyes We consider the weights coming from each feature are in competition.

Bird Cat Bat beak Bird Cat Bat wings Bird Cat Bat eyes The results of the competition. These weights in the Rule Network have been pushed down by mutual competition This weight in the Rule Network has won the competition and is now much stronger

Bird Cat Bat beak Bird Cat Bat wings Bird Cat Bat eyes The network has found a category- determining feature for birds

Yes, but in a biologically plausible model, what could it possibly mean for “synaptic weights to be in competition” ?

We will implement a mechanism that is equivalent to weight-competition using noise.

Revised architecture of the model, without weight competition Kohonen Network Extension layers … are echoed in the extension layers. The activations in the primary part of the network … Forms category representations on the basis of perceptual similarity Extracts rules, by discovering which of the stimuli’s features are sufficient to determine category membership.

The neurobiologically plausible Kohonen Network

w 11 w 12 w 31 w 21 w 22 w 32 w 13 w 23 w 33 Kohonen Network: a spreading activation, biologically plausible implementation (f 1.w 11 + f 2.w 21 + f 3.w 31 )Is Cat A node most active ? f1f1 f2f2 f3f3 Input stimulus Output Nodes Category ACategory BCategory C

w 11 w 12 w 31 w 21 w 22 w 32 w 13 w 23 w 33 Kohonen Network (f 1.w 11 + f 2.w 21 + f 3.w 31 )Is Cat A node most active ? f1f1 f2f2 f3f3 Input stimulus Output Nodes Category ACategory BCategory C (f 1.w 12 + f 2.w 22 + f 3.w 32 )Or, is Cat B node most active ?

w 11 w 12 w 31 w 21 w 22 w 32 w 13 w 23 w 33 Kohonen Network (f 1.w 11 + f 2.w 21 + f 3.w 31 )Is Cat A node most active ? f1f1 f2f2 f3f3 Input stimulus Output Nodes Category ACategory BCategory C (f 1.w 12 + f 2.w 22 + f 3.w 32 )Or, is Cat B node most active ? (f 1.w 13 + f 2.w 23 + f 3.w 33 )Or, is Cat C node most active ?

w 11 w 12 w 21 w 22 w 32 w 13 w 23 w 33 f1f1 f2f2 f3f3 If w 31 Kohonen Network (f 1.w 12 + f 2.w 22 + f 3.w 32 ) is the largest, Cat B node is the winner. Category ACategory BCategory C Input stimulus Output Nodes

f1f1 f2f2 f3f3 … … Winner activates itself highly … activates its near neighbours a little … inhibits distant nodes Learning is Hebbian  depends on activation of sending and receiving nodes. Next time a similar stimulus is presented, same output node wins.

Category A

Category B

Now, let’s focus on the features

The Rule Units are a separate layer of nodes whose activation echoes that in the Kohonen network. The Rule Units learn to map input stimuli onto category representations using only rule-determining features. How?

features categories

features categories

features categories

features categories

Training Protocol 1.Present Stimulus to Input Layer 2.Pass activation through the various layers 3.Perform Hebbian Update 4.Pass ‘Noise Burst’ through a single input unit, chosen randomly 5.Pass activation through the various layers 6.Perform Hebbian Update Repeat for ~200 training items

A category-determining feature: A noise burst is applied to an input unit at random features categories

A category-determining feature: features categories

A category-determining feature: features categories

A category-determining feature: features categories

A category-IRRELEVANT feature: A noise burst is applied to an input unit at random features categories

A category-IRRELEVANT feature: features categories

A category-IRRELEVANT feature: features categories

A category-IRRELEVANT feature: features categories

Training Stimuli

Model Weights After Training Cat A Cat B Cat C

Test Stimuli Category C (need rule) Category C (need rule) Category B (should not need rule)

Simulation Results Testing with Distorted Category Exemplars

Why did the Rule+Statistical network do better than the Kohonen Network alone? Statistical-learning network Categorises according to any features consistently present in stimuli. Rule-learning network Categorises according to only diagnostic features present in stimuli.

Reaction Times: This network can be successfully produce RT data. When there is a conflict between the Rule Network answer and the Statistical Network answer, the will still get the right answer, but it will take longer than in the case where the Rule Network and the Statistical Network are in agreement. Supervised Learning: By (occasionally) artificially activating category nodes, we can significantly speed up category learning. So, we can address the Poverty of the Stimulus problem. Modeling Reaction Times and Supervised Learning

Conclusions We have developed an unsupervised network (that can also be used to do supervised learning) that extracts the rule from knowledge about statistics of perceptual features Successfully integrates conceptual and perceptual mechanisms of categorization. Biologically plausible (two components potentially map onto areas in the brain)

Background Lit (Other Models) Neural networks that do rule-extraction: have not modelled categorisation behaviour (Földiák, 1989); or have not modelled cognition (Towell & Shavlik, 1993). Models of categorisation behaviour: have not employed rules (e.g., GCM by Nosofsky, ALCOVE by Kruschke, 1998); or have used “off- the shelf” rules and not modelled their acquisition (e.g., ATRIUM by Erickson & Kruschke, 1998; COVIS by Ashby et al., 1998).

Evidence for the Model Rehder and who? (2005) Adult humans learning new categories. Eye-tracking – people stop attending to irrelevant features when they learn the rule But, not very ecologically realistic stimuli. And, feedback on every trial.