1 Backprop: Representations, Representations, Representations Garrison W. Cottrell Gary's Unbelievable Research Unit (GURU) Computer Science and Engineering.

Slides:



Advertisements
Similar presentations
Artificial Intelligence 12. Two Layer ANNs
Advertisements

Multi-Layer Perceptron (MLP)
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
Introduction to Cognitive Science Adrienne Moore, Office hours: Wed. 4-5, Cognitive Science Building,
For Wednesday Read chapter 19, sections 1-3 No homework.
Support Vector Machines
Un Supervised Learning & Self Organizing Maps. Un Supervised Competitive Learning In Hebbian networks, all neurons can fire at the same time Competitive.
5/16/2015Intelligent Systems and Soft Computing1 Introduction Introduction Hebbian learning Hebbian learning Generalised Hebbian learning algorithm Generalised.
What can computational models tell us about face processing? Garrison W. Cottrell Gary's Unbelievable Research Unit (GURU) Computer Science and Engineering.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
CSC321: Neural Networks Lecture 3: Perceptrons
Machine Learning Neural Networks
Simple Neural Nets For Pattern Classification
x – independent variable (input)
Visual Cognition II Object Perception. Theories of Object Recognition Template matching models Feature matching Models Recognition-by-components Configural.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Un Supervised Learning & Self Organizing Maps Learning From Examples
COGNITIVE NEUROSCIENCE
Chapter Seven The Network Approach: Mind as a Web.
Visual Expertise Is a General Skill Maki Sugimoto University of California, San Diego November 20, 2000.
Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.
October 14, 2010Neural Networks Lecture 12: Backpropagation Examples 1 Example I: Predicting the Weather We decide (or experimentally determine) to use.
Artificial Neural Networks
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
Radial Basis Function (RBF) Networks
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
An Introduction to Support Vector Machines Martin Law.
Solving “The Visual Expertise Mystery” with Models Garrison W. Cottrell Gary's Unbelievable Research Unit (GURU) The Perceptual Expertise Network The Temporal.
1 Backprop, 25 years later… Garrison W. Cottrell Gary's Unbelievable Research Unit (GURU) Computer Science and Engineering Department Temporal Dynamics.
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Learning to perceive how hand-written digits were drawn Geoffrey Hinton Canadian Institute for Advanced Research and University of Toronto.
An Introduction to Support Vector Machines (M. Law)
School of Engineering and Computer Science Victoria University of Wellington Copyright: Peter Andreae, VUW Image Recognition COMP # 18.
Talk #3: EMPATH A Neural Network Model of Human Facial Expression Recognition Gary Cottrell Joint work with Matt Dailey, Curt Padgett, and Ralph Adolphs.
Expertise, Millisecond by Millisecond Tim Curran, University of Colorado Boulder 1.
CS-424 Gregory Dudek Today’s Lecture Neural networks –Training Backpropagation of error (backprop) –Example –Radial basis functions.
Perceptrons Gary Cottrell. Cognitive Science Summer School 2 Perceptrons: A bit of history Frank Rosenblatt studied a simple version of a neural net called.
Neural Network Basics Anns are analytical systems that address problems whose solutions have not been explicitly formulated Structure in which multiple.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
EMPATH: A Neural Network that Categorizes Facial Expressions Matthew N. Dailey and Garrison W. Cottrell University of California, San Diego Curtis Padgett.
381 Self Organization Map Learning without Examples.
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Lecture 5 Neural Control
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
EEE502 Pattern Recognition
CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.
What can computational models tell us about face processing? Garrison W. Cottrell Gary's Unbelievable Research Unit (GURU) Computer Science and Engineering.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Computational Intelligence Semester 2 Neural Networks Lecture 2 out of 4.
Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.
Chapter 9 Knowledge. Some Questions to Consider Why is it difficult to decide if a particular object belongs to a particular category, such as “chair,”
Today’s Lecture Neural networks Training
Machine Learning Supervised Learning Classification and Regression
Neural networks.
Do Expression and Identity Need Separate Representations?
Self-Organizing Network Model (SOM) Session 11
Neural Network Architecture Session 2
Deep Feedforward Networks
Artificial Neural Networks
Gary Cottrell & * 07/16/96 The Face of Fear: A Neural Network Model of Human Facial Expression Recognition Garrison W. Cottrell Gary's Unbelievable Research.
Neuropsychology of Vision Anthony Cate April 19, 2001
What can computational models tell us about face processing?
Backprop, Representations, Representations, Representations,
Artificial Intelligence 12. Two Layer ANNs
The Network Approach: Mind as a Web
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

1 Backprop: Representations, Representations, Representations Garrison W. Cottrell Gary's Unbelievable Research Unit (GURU) Computer Science and Engineering Department Temporal Dynamics of Learning Center Institute for Neural Computation UCSD

Back propagation, 25 years later 2 Characteristics of perceptron learning Supervised learning: Gave it a set of input-output examples for it to model the function (a teaching signal) Error correction learning: only correct it when it is wrong. Random presentation of patterns. Slow! Learning on some patterns ruins learning on others.

Back propagation, 25 years later 3 Perceptron Learning Learning rule: w i (t+1) = w i (t) + α*(teacher - output)*x i (α is the learning rate) This is known as the delta rule because learning is based on the delta (difference) between what you did and what you should have done: δ = (teacher - output)

Back propagation, 25 years later 4 Problems with perceptrons The learning rule comes with a great guarantee: anything a perceptron can compute, it can learn to compute. Problem: Lots of things were not computable, e.g., XOR (Minsky & Papert, 1969) Minsky & Papert said: if you had hidden units, you could compute any boolean function. But no learning rule exists for such multilayer networks, and we don’t think one will ever be discovered.

Back propagation, 25 years later 5 Problems with perceptrons

Back propagation, 25 years later 6 Aside about perceptrons They didn’t have hidden units - but Rosenblatt assumed nonlinear preprocessing! Hidden units compute features of the input The nonlinear preprocessing is a way to choose features by hand. Support Vector Machines essentially do this in a principled way, followed by a (highly sophisticated) perceptron learning algorithm.

Back propagation, 25 years later 7 Enter Rumelhart, Hinton, & Williams (1985) (Re-)Discovered a learning rule for networks with hidden units. Works a lot like the perceptron algorithm: Randomly choose an input-output pattern present the input, let activation propagate through the network give the teaching signal propagate the error back through the network (hence the name back propagation) change the connection strengths according to the error

Back propagation, 25 years later 8 Enter Rumelhart, Hinton, & Williams (1985) The actual algorithm uses the chain rule of calculus to go downhill in an error measure with respect to the weights The hidden units must learn features that solve the problem... Activation Error INPUTS OUTPUTS Hidden Units

Back propagation, 25 years later 9 Backpropagation Learning Learning rule: (α is the learning rate) Here,

Back propagation, 25 years later 10 XOR Here, the hidden units learned AND and OR - two features that when combined appropriately, can solve the problem Back Propagation Learning Random Network XOR Network AND OR

Back propagation, 25 years later 11 XOR But, depending on initial conditions, there are an infinite number of ways to do XOR - backprop can surprise you with innovative solutions. Back Propagation Learning Random Network XOR Network AND OR

Back propagation, 25 years later 12 Why is/was this wonderful? Efficiency Learns internal representations recurrent networks Generalizes to recurrent networks

Representations In the next k slides, where k is some integer, we will look at various representations backprop has learned, and problems backprop has solved The mantra here is: Backprop learns representations in the service of the task Back propagation, 25 years later 13

Note here that the number in the unit is the bias: so think of this as what the unit “likes to do” in the absence of any input. XOR Back propagation, 25 years later 14

Note here that the number in the unit is the bias: so think of this as what the unit “likes to do” in the absence of any input. What Boolean function is the network without the hidden unit computing? XOR Back propagation, 25 years later 15 X1X2Y

Note here that the number in the unit is the bias: so think of this as what the unit “likes to do” in the absence of any input. What function is the hidden unit computing? It only turns on for (0,0)! XOR Back propagation, 25 years later 16

Symmetry Back propagation, 25 years later 17

Addition Back propagation, 25 years later 18 Here – it is useful to think of these as linear threshold units. What is the hidden unit on the right doing? The one on the left?

NetTalk (Sejnowski & Rosenberg, 1987) A network that learns to read aloud from text Trained on a corpus of 500 words Overnight on a Vax Business Week: It learns the abilities of a six-year old child overnight! (a lot of hype…) Maps time into space – the text is stepped through the input (a lot of alignment issues) Back propagation, 25 years later 19

NetTalk Architecture Back propagation, 25 years later 20

Back propagation, 25 years later 21

Test set Back propagation, 25 years later 22

NetTalk hidden unit analysis Back propagation, 25 years later 23

Back propagation, 25 years later 24 Hinton’s Family Trees example Idea: Learn to represent relationships between people that are encoded in a family tree:

Back propagation, 25 years later 25 Hinton’s Family Trees example Idea 2: Learn distributed representations of concepts: localist outputs Learn: features of these entities useful for solving the task Input: localist peoplelocalist relations Localist: one unit “ ON ” to represent each item

Back propagation, 25 years later 26 People hidden units: Hinton diagram What does the unit 1 encode? What is unit 1 encoding?

Back propagation, 25 years later 27 People hidden units: Hinton diagram What does unit 2 encode? What is unit 2 encoding?

Back propagation, 25 years later 28 People hidden units: Hinton diagram Unit 6? What is unit 6 encoding?

Back propagation, 25 years later 29 People hidden units: Hinton diagram When all three are on, these units pick out Christopher and Penelope: Other combinations pick out other parts of the trees

Back propagation, 25 years later 30 Relation units What does the lower middle one code?

Back propagation, 25 years later 31 Lessons The network learns features in the service of the task - i.e., it learns features on its own. This is useful if we don’t know what the features ought to be. Can explain some human phenomena

Switch to Demo This demo is downloadable from my website under “Resources” About the middle of the page: “A matlab neural net demo with face processing.” Back propagation, 25 years later 32

Back propagation, 25 years later 33 Another example In the next example(s), I make two points: The perceptron algorithm is still useful! Representations learned in the service of the task can explain the “Visual Expertise Mystery”

Back propagation, 25 years later 34 A Face Processing System PCA Happy Sad Afraid Angry Surprised Disgusted Neural Net Pixel (Retina) Level Object (IT) Level Perceptual (V1) Level Category Level Gabor Filtering

Back propagation, 25 years later 35 The Face Processing System PCA Bob Carol Ted Alice Neural Net Pixel (Retina) Level Object (IT) Level Perceptual (V1) Level Category Level Gabor Filtering

Back propagation, 25 years later 36 The Face Processing System PCA Bob Carol Ted Alice Neural Net Pixel (Retina) Level Object (IT) Level Perceptual (V1) Level Category Level Gabor Filtering

Back propagation, 25 years later 37 The Face Processing System PCA Gabor Filtering Bob Carol Ted Cup Can Book Neural Net Pixel (Retina) Level Object (IT) Level Perceptual (V1) Level Category Level Featurelevel

Back propagation, 25 years later 38 The Gabor Filter Layer Basic feature: the 2-D Gabor wavelet filter (Daugman, 85): These model the processing in early visual areas Convolution * Magnitudes Subsample in a 29x36 grid

Back propagation, 25 years later 39 Principal Components Analysis The Gabor filters give us 40,600 numbers We use PCA to reduce this to 50 numbers PCA is like Factor Analysis: It finds the underlying directions of Maximum Variance PCA can be computed in a neural network through a competitive Hebbian learning mechanism Hence this is also a biologically plausible processing step We suggest this leads to representations similar to those in Inferior Temporal cortex

Back propagation, 25 years later 40 How to do PCA with a neural network (Cottrell, Munro & Zipser, 1987; Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; O’Toole et al. 1991) n A self-organizing network that learns whole-object representations (features, Principal Components, Holons, eigenfaces) (features, Principal Components, Holons, eigenfaces)... Holons (Gestalt layer) Input from Perceptual Layer

Back propagation, 25 years later 41 How to do PCA with a neural network (Cottrell, Munro & Zipser, 1987; Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; O’Toole et al. 1991) n A self-organizing network that learns whole-object representations (features, Principal Components, Holons, eigenfaces) (features, Principal Components, Holons, eigenfaces)... Holons (Gestalt layer) Input from Perceptual Layer

Back propagation, 25 years later 42 How to do PCA with a neural network (Cottrell, Munro & Zipser, 1987; Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; O’Toole et al. 1991) n A self-organizing network that learns whole-object representations (features, Principal Components, Holons, eigenfaces) (features, Principal Components, Holons, eigenfaces)... Holons (Gestalt layer) Input from Perceptual Layer

Back propagation, 25 years later 43 How to do PCA with a neural network (Cottrell, Munro & Zipser, 1987; Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; O’Toole et al. 1991) n A self-organizing network that learns whole-object representations (features, Principal Components, Holons, eigenfaces) (features, Principal Components, Holons, eigenfaces)... Holons (Gestalt layer) Input from Perceptual Layer

Back propagation, 25 years later 44 How to do PCA with a neural network (Cottrell, Munro & Zipser, 1987; Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; O’Toole et al. 1991) n A self-organizing network that learns whole-object representations (features, Principal Components, Holons, eigenfaces) (features, Principal Components, Holons, eigenfaces)... Holons (Gestalt layer) Input from Perceptual Layer

Back propagation, 25 years later 45 How to do PCA with a neural network (Cottrell, Munro & Zipser, 1987; Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; O’Toole et al. 1991) n A self-organizing network that learns whole-object representations (features, Principal Components, Holons, eigenfaces) (features, Principal Components, Holons, eigenfaces)... Holons (Gestalt layer) Input from Perceptual Layer

Back propagation, 25 years later 46 How to do PCA with a neural network (Cottrell, Munro & Zipser, 1987; Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; O’Toole et al. 1991) n A self-organizing network that learns whole-object representations (features, Principal Components, Holons, eigenfaces) (features, Principal Components, Holons, eigenfaces)... Holons (Gestalt layer) Input from Perceptual Layer

Back propagation, 25 years later 47 Holons They act like face cells (Desimone, 1991): Response of single units is strong despite occluding eyes, e.g. Response drops off with rotation Some fire to my dog’s face A novel representation: Distributed templates -- each unit’s optimal stimulus is a ghostly looking face (template-like), but many units participate in the representation of a single face (distributed). For this audience: Neither exemplars nor prototypes! Explain holistic processing: Why? If stimulated with a partial match, the firing represents votes for this template: Units “downstream” don’t know what caused this unit to fire. (more on this later…)

Back propagation, 25 years later 48 The Final Layer: Classification (Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; Padgett & Cottrell 1996; Dailey & Cottrell, 1999; Dailey et al. 2002) The holistic representation is then used as input to a categorization network trained by supervised learning. Excellent generalization performance demonstrates the sufficiency of the holistic representation for recognition Holons Categories... Output: Cup, Can, Book, Greeble, Face, Bob, Carol, Ted, Happy, Sad, Afraid, etc. Input from Perceptual Layer

Back propagation, 25 years later 49 The Final Layer: Classification Categories can be at different levels: basic, subordinate. Simple learning rule (~delta rule). It says (mild lie here): add inputs to your weights (synaptic strengths) when you are supposed to be on, subtract them when you are supposed to be off. This makes your weights “look like” your favorite patterns – the ones that turn you on. When no hidden units => No back propagation of error. When hidden units: we get task-specific features (most interesting when we use the basic/subordinate distinction)

Back propagation, 25 years later 50 Facial Expression Database Ekman and Friesen quantified muscle movements (Facial Actions) involved in prototypical portrayals of happiness, sadness, fear, anger, surprise, and disgust. Result: the Pictures of Facial Affect Database (1976). 70% agreement on emotional content by naive human subjects. 110 images, 14 subjects, 7 expressions. Anger, Disgust, Neutral, Surprise, Happiness (twice), Fear, and Sadness This is actor “ JJ ” : The easiest for humans (and our model) to classify

Back propagation, 25 years later 51 Results (Generalization) Kendall’s tau (rank order correlation):.667, p=.0441 Note: This is an emergent property of the model!Expression Network % Correct Human % Agreement Happiness100.0%98.7% Surprise100.0%92.4% Disgust100.0%92.3% Anger89.2%88.9% Sadness82.9%89.2% Fear66.7%87.7% Average89.9%91.6%

Back propagation, 25 years later 52 Correlation of Net/Human Errors Like all good Cognitive Scientists, we like our models to make the same mistakes people do! Networks and humans have a 6x6 confusion matrix for the stimulus set. This suggests looking at the off-diagonal terms: The errors Correlation of off-diagonal terms: r = [F (1,28) = 13.3; p = ] Again, this correlation is an emergent property of the model: It was not told which expressions were confusing.

Back propagation, 25 years later 53 Examining the Net’s Representations We want to visualize “receptive fields” in the network. But the Gabor magnitude representation is noninvertible. We can learn an approximate inverse mapping, however. We used linear regression to find the best linear combination of Gabor magnitude principal components for each image pixel. Then projecting each unit’s weight vector into image space with the same mapping visualizes its “receptive field.”

Back propagation, 25 years later 54 Examining the Net’s Representations The “y-intercept” coefficient for each pixel is simply the average pixel value at that location over all faces, so subtracting the resulting “average face” shows more precisely what the units attend to: Apparently local features appear in the global templates.

Back propagation, 25 years later 55 Morph Transition Perception Morphs help psychologists study categorization behavior in humans Example: JJ Fear to Sadness morph: Young et al. (1997) Megamix: presented images from morphs of all 6 emotions (15 sequences) to subjects in random order, task is 6-way forced choice button push. 0% 10% 30% 50% 70% 90% 100%

Back propagation, 25 years later 56 Results: classical Categorical Perception: sharp boundaries… …and higher discrimination of pairs of images when they cross a perceived category boundary inessFearSadnessDisgustAngerHapp-Surprise PERCENT CORRECT DISCRIMINATION 6-WAY ALTERNATIVE FORCED CHOICE

Back propagation, 25 years later 57 Results: Non-categorical RT’s “Scalloped” Reaction Times inessFearSadnessDisgustAngerHapp-Surprise REACTION TIME BUTTON PUSH

Back propagation, 25 years later 58 Results: More non-categorical effects Young et al. Also had subjects rate 1 st, 2 nd, and 3 rd most apparent emotion. At the 70/30 morph level, subjects were above chance at detecting mixed-in emotion. These data seem more consistent with continuous theories of emotion.

Back propagation, 25 years later 59 Modeling Megamix 1 trained neural network = 1 human subject. 50 networks, 7 random examples of each expression for training, remainder for holdout. Identification = average of network outputs Response time = uncertainty of maximal output (1.0 - y max ). Mixed-in expression detection: record 1st, 2nd, 3rd largest outputs. Discrimination: 1 – correlation of layer representations We can then find the layer that best accounts for the data

Back propagation, 25 years later 60 Modeling Six-Way Forced Choice Overall correlation r=.9416, with NO FIT PARAMETERS! inessFearSadnessDisgustAngerHapp-Surprise

Back propagation, 25 years later 61 Model Discrimination Scores The model fits the data best at a precategorical layer: The layer we call the “object” layer; NOT at the category level inessFearSadnessDisgustAngerHapp-Surprise PERCENT CORRECT DISCRIMINATION HUMAN MODELOUTPUTLAYERR=0.36 MODELOBJECTLAYERR=0.61

Back propagation, 25 years later 62 Discrimination Classically, one requirement for “categorical perception” is higher discrimination of two stimuli at a fixed distance apart when those two stimuli cross a category boundary Indeed, Young et al. found in two kinds of tests that discrimination was highest at category boundaries. The result that we fit the data best at a layer before any categorization occurs is significant: In some sense, the category boundaries are “in the data,” or at least, in our representation of the data.

Back propagation, 25 years later 63 Outline An overview of our facial expression recognition system. The internal representation shows the model’s prototypical representations of Fear, Sadness, etc. How our model accounts for the “categorical” data How our model accounts for the “non-categorical” data Discussion Conclusions for part 1

Back propagation, 25 years later 64 Reaction Time: Human/Model inessFearSadnessDisgustAngerHapp-Surprise Correlation between model & data:.6771, p<.001 HUMAN SUBJECTS REACTION TIME MODEL REACTION TIME (1 - MAX_OUTPUT)

Back propagation, 25 years later 65 Mix Detection in the Model Can the network account for the continuous data as well as the categorical data? YES.

Back propagation, 25 years later 66 Human/Model Circumplexes These are derived from similarities between images using non-metric Multi-dimensional scaling. For humans: similarity is correlation between 6-way forced- choice button push. For networks: similarity is correlation between 6-category output vectors.

Back propagation, 25 years later 67 Outline An overview of our facial expression recognition system. How our model accounts for the “categorical” data How our model accounts for the “two- dimensional” data The internal representation shows the model’s prototypical representations of Fear, Sadness, etc. Discussion Conclusions for part 1

Back propagation, 25 years later 68 Discussion Our model of facial expression recognition: Performs the same task people do On the same stimuli At about the same accuracy Without actually “feeling” anything, without any access to the surrounding culture, it nevertheless: Organizes the faces in the same order around the circumplex Correlates very highly with human responses. Has about the same rank order difficulty in classifying the emotions

Back propagation, 25 years later 69 Discussion The discrimination correlates with human results most accurately at a precategorization layer: The discrimination improvement at category boundaries is in the representation of data, not based on the categories. These results suggest that for expression recognition, the notion of “categorical perception” simply is not necessary to explain the data. Indeed, most of the data can be explained by the interaction between the similarity of the representations and the categories imposed on the data: Fear faces are similar to surprise faces in our representation – so they are near each other in the circumplex.

Back propagation, 25 years later 70 Conclusions from this part of the talk The best models perform the same task people do Concepts such as “similarity” and “categorization” need to be understood in terms of models that do these tasks Our model simultaneously fits data supporting both categorical and continuous theories of emotion The fits, we believe, are due to the interaction of the way the categories slice up the space of facial expressions, and the way facial expressions inherently resemble one another. It also suggests that the continuous theories are correct: “discrete categories” are not required to explain the data. We believe our results will easily generalize to other visual tasks, and other modalities.

Back propagation, 25 years later 71 The Visual Expertise Mystery Why would a face area process BMW’s?: Why would a face area process BMW’s?: Behavioral & brain data Behavioral & brain data Model of expertise Model of expertise results results

Back propagation, 25 years later 72 Are you a perceptual expert? Take the expertise test!!!** “Identify this object with the first name that comes to mind.” **Courtesy of Jim Tanaka, University of Victoria

Back propagation, 25 years later 73 “Car” - Not an expert “2002 BMW Series 7” - Expert!

Back propagation, 25 years later 74 “Bird” or “Blue Bird” - Not an expert “Indigo Bunting” - Expert!

Back propagation, 25 years later 75 “Face” or “Man” - Not an expert “ George Dubya”- Expert! “ Jerk ” or “ Megalomaniac ” - Democrat!

Back propagation, 25 years later 76 How is an object to be named? Bird Basic Level (Rosch et al., 1971) Indigo Bunting Subordinate Species Level Animal Superordinate Level

Back propagation, 25 years later 77 Entry Point Recognition Bird Indigo Bunting Animal Visual analysis Fine grain visual analysis Semantic analysis Entry Point Downward Shift Hypothesis

Back propagation, 25 years later 78 Dog and Bird Expert Study Each expert had a least 10 years experience in their respective domain of expertise. None of the participants were experts in both dogs and birds. Participants provided their own controls. Tanaka & Taylor, 1991

Back propagation, 25 years later 79 Robin Bird Animal Object Verification Task Superordinate Basic Subordinate Plant Dog Sparrow YES NO

Back propagation, 25 years later Mean Reaction Time (msec) SuperordinateBasicSubordinate Downward Shift Expert Domain Novice Domain Dog and bird experts recognize objects in their domain of expertise at subordinate levels. AnimalBird/DogRobin/Beagle

Back propagation, 25 years later 81 Is face recognition a general form of perceptual expertise? George W. Bush Indigo Bunting 2002 Series 7 BMW

Back propagation, 25 years later Mean Reaction Time (msec) SuperordinateBasicSubordinate Tanaka, 2001 Downward Shift Face experts recognize faces at the individual level of unique identity Faces Objects

Back propagation, 25 years later 83 Bentin, Allison, Puce, Perez & McCarthy, 1996 Event-related Potentials and Expertise N170 Face ExpertsObject Experts Tanaka & Curran, 2001; see also Gauthier, Curran, Curby & Collins, 2003, Nature Neuro. Novice DomainExpert Domain

Back propagation, 25 years later 84 Neuroimaging of face, bird and car experts “Face Experts” Fusiform Gyrus Car Experts Bird Experts Fusiform Gyrus Fusiform Gyrus Cars-ObjectsBirds-Objects Gauthier et al., 2000

Back propagation, 25 years later 85 Behavioral benchmarks of expertise Downward shift in entry point recognition Improved discrimination of novel exemplars from learned and related categories Neurological benchmarks of expertise Enhancement of N170 ERP brain component Increased activation of fusiform gyrus How to identify an expert?

86 End of Tanaka Slides

Back propagation, 25 years later 87 Kanwisher showed the FFA is specialized for faces But she forgot to control for what? ??

Back propagation, 25 years later 88 Greeble Experts (Gauthier et al. 1999) Subjects trained over many hours to recognize individual Greebles. Activation of the FFA increased for Greebles as the training proceeded.

Back propagation, 25 years later 89 The “visual expertise mystery” If the so-called “Fusiform Face Area” (FFA) is specialized for face processing, then why would it also be used for cars, birds, dogs, or Greebles? Our view: the FFA is an area associated with a process: fine level discrimination of homogeneous categories. But the question remains: why would an area that presumably starts as a face area get recruited for these other visual tasks? Surely, they don’t share features, do they? Sugimoto & Cottrell (2001), Proceedings of the Cognitive Science Society

Back propagation, 25 years later 90 Solving the mystery with models Main idea: There are multiple visual areas that could compete to be the Greeble expert - “basic” level areas and the “expert” (FFA) area. The expert area must use features that distinguish similar looking inputs -- that’s what makes it an expert Perhaps these features will be useful for other fine-level discrimination tasks. We will create Basic level models - trained to identify an object’s class Expert level models - trained to identify individual objects. Then we will put them in a race to become Greeble experts. Then we can deconstruct the winner to see why they won. Sugimoto & Cottrell (2001), Proceedings of the Cognitive Science Society

Back propagation, 25 years later 91 Model Database A network that can differentiate faces, books, cups and A network that can differentiate faces, books, cups and cans is a “ basic level network. ” cans is a “ basic level network. ” A network that can also differentiate individuals within ONE A network that can also differentiate individuals within ONE class (faces, cups, cans OR books) is an “ expert. ”

Back propagation, 25 years later 92 Model Pretrain two groups of neural networks on different tasks. Compare the abilities to learn a new individual Greeble classification task. cup Carol book can Bob Ted can cup book face Hidden layer Greeble1 Greeble2 Greeble3 Greeble1 Greeble2 Greeble3 (Experts) (Non-experts)

Back propagation, 25 years later 93 Expertise begets expertise Learning to individuate cups, cans, books, or faces first, leads to faster learning of Greebles (can’t try this with kids!!!). The more expertise, the faster the learning of the new task! Hence in a competition with the object area, FFA would win. Hence in a competition with the object area, FFA would win. If our parents were cans, the FCA (Fusiform Can Area) would win. AmountOfTrainingRequired To be a GreebleExpert Training Time on first task

Back propagation, 25 years later 94 Entry Level Shift: Subordinate RT decreases with training (rt = uncertainty of response = 1.0 -max(output)) Human data --- Subordinate Basic RT Network data # Training Sessions

Back propagation, 25 years later 95 How do experts learn the task? Expert level networks must be sensitive to within-class variation: Expert level networks must be sensitive to within-class variation: Representations must amplify small differences Representations must amplify small differences Basic level networks must ignore within-class variation. Basic level networks must ignore within-class variation. Representations should reduce differences Representations should reduce differences

Back propagation, 25 years later 96 Observing hidden layer representations Principal Components Analysis on hidden unit activation: Principal Components Analysis on hidden unit activation: PCA of hidden unit activations allows us to reduce the dimensionality (to 2) and plot representations. PCA of hidden unit activations allows us to reduce the dimensionality (to 2) and plot representations. We can then observe how tightly clustered stimuli are in a low-dimensional subspace We can then observe how tightly clustered stimuli are in a low-dimensional subspace We expect basic level networks to separate classes, but not individuals. We expect basic level networks to separate classes, but not individuals. We expect expert networks to separate classes and individuals. We expect expert networks to separate classes and individuals.

Back propagation, 25 years later 97 Subordinate level training magnifies small differences within object representations 1 epoch80 epochs1280 epochs Face Basic greeble

Back propagation, 25 years later 98 Greeble representations are spread out prior to Greeble Training Face Basic greeble

Back propagation, 25 years later 99 Variability Decreases Learning Time Greeble Learning Time Greeble Variance Prior to Learning Greebles (r = )

Back propagation, 25 years later 100 Examining the Net’s Representations We want to visualize “receptive fields” in the network. But the Gabor magnitude representation is noninvertible. We can learn an approximate inverse mapping, however. We used linear regression to find the best linear combination of Gabor magnitude principal components for each image pixel. Then projecting each hidden unit’s weight vector into image space with the same mapping visualizes its “receptive field.”

Back propagation, 25 years later 101 Two hidden unit receptive fields AFTER TRAINING AS A FACE EXPERTAFTER FURTHER TRAINING ON GREEBLES HU 16 HU 36 NOTE: These are not face-specific!

Back propagation, 25 years later 102 Controlling for the number of classes We obtained 13 classes from hemera.com: 10 of these are learned at the basic level. 10 faces, each with 8 expressions, make the expert task 3 (lamps, ships, swords) are used for the novel expertise task.

Back propagation, 25 years later 103 Results: Pre-training New initial tasks of similar difficulty: In previous work, the basic level task was much easier. These are the learning curves for the 10 object classes and the 10 faces.

Back propagation, 25 years later 104 Results As before, experts still learned new expert level tasks faster Number of epochs To learn swords After learning faces Or objects Number of training epochs on faces or objects

Back propagation, 25 years later 105 Outline Why would a face area process BMW’s? Why this model is wrong…

Back propagation, 25 years later 106 background I have a model of face and object processing that accounts for a lot of data…

Back propagation, 25 years later 107 The Face and Object Processing System PCA Gabor Filtering Book Can Cup Face Bob Sue Jane Expert Level Classifier Pixel (Retina) Level Gestalt Level Perceptual (V1) Level Category Level Hidden Layer Book Can Cup Face Basic Level Classifier FFA LOC

Back propagation, 25 years later 108 Effects accounted for by this model Categorical effects in facial expression recognition (Dailey et al., 2002) Similarity effects in facial expression recognition (ibid) Why fear is “hard” (ibid) Rank of difficulty in configural, featural, and inverted face discrimination (Zhang & Cottrell, 2006) Holistic processing of identity and expression, and the lack of interaction between the two (Cottrell et al. 2000) How a specialized face processor may develop under realistic developmental constraints (Dailey & Cottrell, 1999) How the FFA could be recruited for other areas of expertise (Sugimoto & Cottrell, 2001; Joyce & Cottrell, 2004, Tran et al., 2004; Tong et al, 2005) The other race advantage in visual search (Haque & Cottrell, 2005) Generalization gradients in expertise (Nguyen & Cottrell, 2005) Priming effects (face or character) on discrimination of ambiguous chinese characters (McCleery et al, under review) Age of Acquisition Effects in face recognition (Lake & Cottrell, 2005; L&C, under review) Memory for faces (Dailey et al., 1998, 1999) Cultural effects in facial expression recognition (Dailey et al., in preparation)

Back propagation, 25 years later 109 Backprop, 25 years later Backprop is important because it was the first relatively efficient method for learning internal representations Recent advances have made deeper networks possible This is important because we don’t know how the brain uses transformations to recognize objects across a wide array of variations (e.g., the Halle Berry neuron)

Back propagation, 25 years later 110 E.g., the “Halle Berry” neuron…

111 END