An Unsupervised Connectionist Model of Rule Emergence in Category Learning Rosemary Cowell & Robert French LEAD-CNRS, Dijon, France EC FP6 NEST Grant
Key idea driving our research A rule for categorization has emerged when the observer disregards a significant subset of an object’s features and focuses only on a reduced subset of its features in order to determine the object’s category.
Rule No. of features attended to “Eureka moment” “No Eureka moment”
Goal: To develop an unsupervised learning system from which simple rules emerge. Young infants do not receive “rule instruction” for category learning. Animals have even less rule instruction than human infants. We are not claiming that ALL rule learning occurs in this manner, but rather that some, especially for young infants, does.
w 11 w 12 w 31 w 21 w 22 w 32 w 13 w 23 w 33 Kohonen Network Category ACategory BCategory C (w 11, w 21, w 31 ) ?Is (f 1, f 2, f 3 ) closest to f1f1 f2f2 f3f3 New Object
w 11 w 12 w 31 w 21 w 22 w 32 w 13 w 23 w 33 Kohonen Network Category ACategory BCategory C f1f1 f2f2 f3f3 New Object (f 1, f 2, f 3 ) is closest to (w 11, w 21, w 31 ) or (w 12, w 22, w 32 ) ?
w 11 w 12 w 31 w 21 w 22 w 32 w 13 w 23 w 33 Kohonen Network Category ACategory BCategory C f1f1 f2f2 f3f3 New Object Is (f 1, f 2, f 3 ) closest to (w 11, w 21, w 31 ) or (w 12, w 22, w 32 ) or (w 13, w 23, w 33 ) ?
w 11 w 12 w 31 w 21 w 22 w 32 w 13 w 23 w 33 Kohonen Network Category ACategory BCategory C f1f1 f2f2 f3f3 New Object Is (f 1, f 2, f 3 ) closest to (w 11, w 21, w 31 ) or (w 12, w 22, w 32 ) or (w 13, w 23, w 33 ) ? If this is the winner, we modify (w 13, w 23, w 33 ) to be a little closer to (f 1, f 2, f 3 ) than before.
No. of calories consumed per day Life expectancy Weight vectors (w 11, w 21, w 31 ) (w 12, w 22, w 32 ) (w 13, w 23, w 33 )
No. of calories consumed per day Life expectancy
No. of calories consumed per day Life expectancy
No. of calories consumed per day Life expectancy
No. of calories consumed per day Life expectancy
No. of calories consumed per day Life expectancy
No. of calories consumed per day Life expectancy
No. of calories consumed per day Life expectancy
No. of calories consumed per day Life expectancy
No. of calories consumed per day Life expectancy
No. of calories consumed per day Life expectancy
No. of calories consumed per day Life expectancy
No. of calories consumed per day Life expectancy
No. of calories consumed per day Life expectancy
No. of calories consumed per day Life expectancy
No. of calories consumed per day Life expectancy The new weight vectors are now the centroids of each category.
Let’s see how we could extract rules from this kind of system.
A first idea Kohonen Network Competitive learning network A copy of the Kohonen network watches…. …the weights of the original Kohonen network.
w 11 w 12 w 13 w 21 w 22 w 32 w 13 w 23 w 33 Category-Determining Features Category ACategory BCategory C f1f1 f2f2 f3f3 New Object A category-determining feature is one that is sufficient to categorize an object, e.g., if f 3 present, then the object is in Category C and it will not be in Categories A or B. In other words, if the object is in Category C, f 3 will match w 33, but will not match w 31 or w 32.
w 11 w 12 w 13 w 21 w 22 w 32 w 13 w 23 w 33 Category-Irrelevant Features Category ACategory BCategory C f1f1 f2f2 f3f3 New Object A category-irrelevant feature is one that is not sufficient to categorize an object because it is shared by objects in two or more categories. e.g., if f 3 present, then the object may in Category A or C. In other words, f 3 matches both w 31 and w 33.
A concrete example: A Cat-Bird-Bat categorizer Bird Cat Bat eyes wings beak Input stimulus Category Nodes
Bird Cat Bat eyes wings beak A category-determining feature is one that is sufficient to categorize an object, e.g., if ‘beak’, then ‘bird’ and not ‘bat’ or ‘cat’. Input stimulus Category Nodes
Bird Cat Bat eyes wings beak A category-irrelevant feature is one that is not sufficient to categorize an object because it is shared by two or more categories, e.g., if ‘wings’, then ‘bird’ or ‘bat’. Input stimulus Category Nodes
Bird Cat Bat beak Bird Cat Bat wings Bird Cat Bat eyes How do we isolate the category-determining features from category-irrelevant features?
One answer: competition between the weights in a separate network – the Rule Network -- that is a copy of the Kohonen network.
The Rule Network The Rule Network is a separate competitive network with a weight configuration that matches that of the Kohonen network. It “watches” the Kohonen Network to find rule- determining features. How?
Bird Cat Bat beak Bird Cat Bat wings Bird Cat Bat eyes We consider the weights coming from each feature are in competition.
Bird Cat Bat beak Bird Cat Bat wings Bird Cat Bat eyes The results of the competition. These weights in the Rule Network have been pushed down by mutual competition This weight in the Rule Network has won the competition and is now much stronger
Bird Cat Bat beak Bird Cat Bat wings Bird Cat Bat eyes The network has found a category- determining feature for birds
Yes, but in a biologically plausible model, what could it possibly mean for “synaptic weights to be in competition” ?
We will implement a mechanism that is equivalent to weight-competition using noise.
Revised architecture of the model, without weight competition Kohonen Network Extension layers … are echoed in the extension layers. The activations in the primary part of the network … Forms category representations on the basis of perceptual similarity Extracts rules, by discovering which of the stimuli’s features are sufficient to determine category membership.
The neurobiologically plausible Kohonen Network
w 11 w 12 w 31 w 21 w 22 w 32 w 13 w 23 w 33 Kohonen Network: a spreading activation, biologically plausible implementation (f 1.w 11 + f 2.w 21 + f 3.w 31 )Is Cat A node most active ? f1f1 f2f2 f3f3 Input stimulus Output Nodes Category ACategory BCategory C
w 11 w 12 w 31 w 21 w 22 w 32 w 13 w 23 w 33 Kohonen Network (f 1.w 11 + f 2.w 21 + f 3.w 31 )Is Cat A node most active ? f1f1 f2f2 f3f3 Input stimulus Output Nodes Category ACategory BCategory C (f 1.w 12 + f 2.w 22 + f 3.w 32 )Or, is Cat B node most active ?
w 11 w 12 w 31 w 21 w 22 w 32 w 13 w 23 w 33 Kohonen Network (f 1.w 11 + f 2.w 21 + f 3.w 31 )Is Cat A node most active ? f1f1 f2f2 f3f3 Input stimulus Output Nodes Category ACategory BCategory C (f 1.w 12 + f 2.w 22 + f 3.w 32 )Or, is Cat B node most active ? (f 1.w 13 + f 2.w 23 + f 3.w 33 )Or, is Cat C node most active ?
w 11 w 12 w 21 w 22 w 32 w 13 w 23 w 33 f1f1 f2f2 f3f3 If w 31 Kohonen Network (f 1.w 12 + f 2.w 22 + f 3.w 32 ) is the largest, Cat B node is the winner. Category ACategory BCategory C Input stimulus Output Nodes
f1f1 f2f2 f3f3 … … Winner activates itself highly … activates its near neighbours a little … inhibits distant nodes Learning is Hebbian depends on activation of sending and receiving nodes. Next time a similar stimulus is presented, same output node wins.
Category A
Category B
Now, let’s focus on the features
The Rule Units are a separate layer of nodes whose activation echoes that in the Kohonen network. The Rule Units learn to map input stimuli onto category representations using only rule-determining features. How?
features categories
features categories
features categories
features categories
Training Protocol 1.Present Stimulus to Input Layer 2.Pass activation through the various layers 3.Perform Hebbian Update 4.Pass ‘Noise Burst’ through a single input unit, chosen randomly 5.Pass activation through the various layers 6.Perform Hebbian Update Repeat for ~200 training items
A category-determining feature: A noise burst is applied to an input unit at random features categories
A category-determining feature: features categories
A category-determining feature: features categories
A category-determining feature: features categories
A category-IRRELEVANT feature: A noise burst is applied to an input unit at random features categories
A category-IRRELEVANT feature: features categories
A category-IRRELEVANT feature: features categories
A category-IRRELEVANT feature: features categories
Training Stimuli
Model Weights After Training Cat A Cat B Cat C
Test Stimuli Category C (need rule) Category C (need rule) Category B (should not need rule)
Simulation Results Testing with Distorted Category Exemplars
Why did the Rule+Statistical network do better than the Kohonen Network alone? Statistical-learning network Categorises according to any features consistently present in stimuli. Rule-learning network Categorises according to only diagnostic features present in stimuli.
Reaction Times: This network can be successfully produce RT data. When there is a conflict between the Rule Network answer and the Statistical Network answer, the will still get the right answer, but it will take longer than in the case where the Rule Network and the Statistical Network are in agreement. Supervised Learning: By (occasionally) artificially activating category nodes, we can significantly speed up category learning. So, we can address the Poverty of the Stimulus problem. Modeling Reaction Times and Supervised Learning
Conclusions We have developed an unsupervised network (that can also be used to do supervised learning) that extracts the rule from knowledge about statistics of perceptual features Successfully integrates conceptual and perceptual mechanisms of categorization. Biologically plausible (two components potentially map onto areas in the brain)
Background Lit (Other Models) Neural networks that do rule-extraction: have not modelled categorisation behaviour (Földiák, 1989); or have not modelled cognition (Towell & Shavlik, 1993). Models of categorisation behaviour: have not employed rules (e.g., GCM by Nosofsky, ALCOVE by Kruschke, 1998); or have used “off- the shelf” rules and not modelled their acquisition (e.g., ATRIUM by Erickson & Kruschke, 1998; COVIS by Ashby et al., 1998).
Evidence for the Model Rehder and who? (2005) Adult humans learning new categories. Eye-tracking – people stop attending to irrelevant features when they learn the rule But, not very ecologically realistic stimuli. And, feedback on every trial.