An informal account of BackProp For each pattern in the training set: Compute the error at the output nodes Compute  w for each wt in 2 nd layer Compute.

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

NEURAL NETWORKS Backpropagation Algorithm
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
Models of Learning Hebbian ~ coincidence Recruitment ~ one trial Supervised ~ correction (backprop) Reinforcement ~ delayed reward Unsupervised ~ similarity.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Tuomas Sandholm Carnegie Mellon University Computer Science Department
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Lecture 14 – Neural Networks
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Connectionist Models: Backprop Jerome Feldman CS182/CogSci110/Ling109 Spring 2008.
CS 182 Sections slides created Eva Mok Feb 22, 2006 (
COLOR VISION © Stephen E. Palmer, 2002.
Connectionist Models: Lecture 3 Srini Narayanan CS182/CogSci110/Ling109 Spring 2006.
COLOR VISION After Stephen E. Palmer, 2002 COLOR VISION “The Color Story” is a prototype for Cognitive Science Contributions from: Physics (Newton) Philosophy.
The Neural Basis of Thought and Language Midterm Review Session.
Capturing Light… in man and machine : Computational Photography Alexei Efros, CMU, Fall 2010.
Capturing Light… in man and machine : Computational Photography Alexei Efros, CMU, Fall 2008.
CS 182 Sections Eva Mok Feb 11, 2004 ( bad puns alert!
Broca’s area Pars opercularis Motor cortexSomatosensory cortex Sensory associative cortex Primary Auditory cortex Wernicke’s area Visual associative cortex.
Introduction to Neural Networks John Paxton Montana State University Summer 2003.
Artificial Neural Networks
LOGO Classification III Lecturer: Dr. Bo Yuan
CS 484 – Artificial Intelligence
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Radial-Basis Function Networks
Artificial Neural Networks
Computer Science and Engineering
Artificial Neural Networks
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
© Copyright 2004 ECE, UM-Rolla. All rights reserved A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 23 Nov 2, 2005 Nanjing University of Science & Technology.
Machine Learning Chapter 4. Artificial Neural Networks
Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
Classification / Regression Neural Networks 2
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Artificial Neural Networks An Introduction. What is a Neural Network? A human Brain A porpoise brain The brain in a living creature A computer program.
Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.
Multi-Layer Feedforward Neural Networks CAP5615Intro. to Neural Networks Xingquan (Hill) Zhu.
Neural Networks and Backpropagation Sebastian Thrun , Fall 2000.
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Chapter 2 Single Layer Feedforward Networks
Neural Networks - lecture 51 Multi-layer neural networks  Motivation  Choosing the architecture  Functioning. FORWARD algorithm  Neural networks as.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
BACKPROPAGATION (CONTINUED) Hidden unit transfer function usually sigmoid (s-shaped), a smooth curve. Limits the output (activation) unit between 0..1.
Neural Networks Lecture 11: Learning in recurrent networks Geoffrey Hinton.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
CS 182 Sections Leon Barrett ( bad puns alert!
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
CS 182 Sections Leon Barrett (
Neural Networks - Berrin Yanıkoğlu1 MLP & Backpropagation Issues.
Machine Learning Supervised Learning Classification and Regression
Learning with Perceptrons and Neural Networks
Chapter 2 Single Layer Feedforward Networks
第 3 章 神经网络.
Real Neurons Cell structures Cell body Dendrites Axon
Classification / Regression Neural Networks 2
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Machine Learning: Lecture 4
Machine Learning: UNIT-2 CHAPTER-1
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
The McCullough-Pitts Neuron
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

An informal account of BackProp For each pattern in the training set: Compute the error at the output nodes Compute  w for each wt in 2 nd layer Compute delta (generalized error expression) for hidden units Compute  w for each wt in 1 st layer After amassing  w for all weights and, change each wt a little bit, as determined by the learning rate

Back-Propagation Algorithm We define the error term for a single node to be t i - y i xixi f yjyj w ij yiyi x i = ∑ j w ij y j y i = f(x i ) t i :target Sigmoid:

Backprop Details Here we go… Also refer to web notes for derivation

kji w jk w ij E = Error = ½ ∑ i (t i – y i ) 2 yiyi t i : target The derivative of the sigmoid is just The output layer learning rate

kji w jk w ij E = Error = ½ ∑ i (t i – y i ) 2 yiyi t i : target The hidden layer

Backprop learning algorithm n=1; initialize w(n) randomly; while (stopping criterion not satisfied and n<max_iterations) for each example (x,d) - run the network with input x and compute the output y - update the weights in backward order starting from those of the output layer: with computed using the (generalized) Delta rule end-for n = n+1; end-while;

Backpropagation Algorithm Initialize all weights to small random numbers For each training example do  For each hidden unit h:  For each output unit k:  For each hidden unit h:  Update each network weight w ij : with

What if all the input To hidden node weights are initially equal?

Momentum term The speed of learning is governed by the learning rate.  If the rate is low, convergence is slow  If the rate is too high, error oscillates without reaching minimum. Momentum tends to smooth small weight error fluctuations. the momentum accelerates the descent in steady downhill directions. the momentum has a stabilizing effect in directions that oscillate in time.

Convergence May get stuck in local minima Weights may diverge …but works well in practice Representation power:  2 layer networks : any continuous function  3 layer networks : any function

Local Minimum USE A RANDOM COMPONENT SIMULATED ANNEALING

Overfitting and generalization TOO MANY HIDDEN NODES TENDS TO OVERFIT

Overfitting in ANNs

Early Stopping (Important!!!) Stop training when error goes up on validation set

Stopping criteria Sensible stopping criteria:  total mean squared error change: Back-prop is considered to have converged when the absolute rate of change in the average squared error per epoch is sufficiently small (in the range [0.01, 0.1]).  generalization based criterion: After each epoch the NN is tested for generalization. If the generalization performance is adequate then stop. If this stopping criterion is used then the part of the training set used for testing the network generalization will not be used for updating the weights.

Architectural Considerations What is the right size network for a given job? How many hidden units? Too many: no generalization Too few: no solution Possible answer: Constructive algorithm, e.g. Cascade Correlation (Fahlman, & Lebiere 1990) etc

The number of layers and of neurons depend on the specific task. In practice this issue is solved by trial and error. Two types of adaptive algorithms can be used:  start from a large network and successively remove some nodes and links until network performance degrades.  begin with a small network and introduce new neurons until performance is satisfactory. Network Topology

Supervised vs Unsupervised Learning Backprop requires a 'target' how realistic is that? Hebbian learning is unsupervised, but limited in power How can we combine the power of backprop (and friends) with the ideal of unsupervised learning?

Autoassociative Networks input copy of input as target Network trained to reproduce the input at the output layer Non-trivial if number of hidden units is smaller than inputs/outputs Forced to develop compressed representations of the patterns Hidden unit representations may reveal natural kinds (e.g. Vowels vs Consonants) Problem of explicit teacher is circumvented

Problems and Networks Some problems have natural "good" solutions Solving a problem may be possible by providing the right armory of general-purpose tools, and recruiting them as needed Networks are general purpose tools. Choice of network type, training, architecture, etc greatly influences the chances of successfully solving a problem Tension: Tailoring tools for a specific job Vs Exploiting general purpose learning mechanism

Summary Multiple layer feed-forward networks  Replace Step with Sigmoid (differentiable) function  Learn weights by gradient descent on error function  Backpropagation algorithm for learning  Avoid overfitting by early stopping

ALVINN drives 70mph on highways

Use MLP Neural Networks when … (vectored) Real inputs, (vectored) real outputs You’re not interested in understanding how it works Long training times acceptable Short execution (prediction) times required Robust to noise in the dataset

Applications of FFNN Classification, pattern recognition: FFNN can be applied to tackle non-linearly separable learning problems.  Recognizing printed or handwritten characters,  Face recognition  Classification of loan applications into credit-worthy and non-credit-worthy groups  Analysis of sonar radar to determine the nature of the source of a signal Regression and forecasting: FFNN can be applied to learn non-linear functions (regression) and in particular functions whose inputs is a sequence of measurements over time (time series).

Extensions of Backprop Nets Recurrent Architectures Backprop through time

Elman Nets & Jordan Nets Updating the context as we receive input In Jordan nets we model “forgetting” as well The recurrent connections have fixed weights You can train these networks using good ol’ backprop Output Hidden ContextInput 1 α Output Hidden ContextInput 1

Recurrent Backprop we’ll pretend to step through the network one iteration at a time backprop as usual, but average equivalent weights (e.g. all 3 highlighted edges on the right are equivalent) abc unrolling 3 iterations abc abc abc w2 w1w3 w4 w1w2w3w4 abc

Connectionist Models in Cognitive Science Structured PDP (Elman) NeuralConceptualExistenceData Fitting Hybrid

5 levels of Neural Theory of Language Cognition and Language Computation Structured Connectionism Computational Neurobiology Biology MidtermQuiz Finals Neural Development Triangle Nodes Neural Net and learning Spatial Relation Motor Control Metaphor SHRUTI Grammar abstraction Pyscholinguistic experiments

The Color Story: A Bridge between Levels of NTL (

A Tour of the Visual System two regions of interest: –retina –LGN

The Physics of Light Light: Electromagnetic energy whose wavelength is between 400 nm and 700 nm. (1 nm = 10 meter) -6 © Stephen E. Palmer, 2002

The Physics of Light Some examples of the spectra of light sources © Stephen E. Palmer, 2002

The Physics of Light Some examples of the reflectance spectra of surfaces Wavelength (nm) % Photons Reflected Red Yellow Blue Purple © Stephen E. Palmer, 2002

The Psychophysical Correspondence There is no simple functional description for the perceived color of all lights under all viewing conditions, but …... A helpful constraint: Consider only physical spectra with normal distributions area mean variance © Stephen E. Palmer, 2002

The Psychophysical Correspondence MeanHue # Photons Wavelength © Stephen E. Palmer, 2002

The Psychophysical Correspondence VarianceSaturation Wavelength # Photons © Stephen E. Palmer, 2002

The Psychophysical Correspondence AreaBrightness # Photons Wavelength © Stephen E. Palmer, 2002

Physiology of Color Vision Cones cone-shaped less sensitive operate in high light color vision Rods rod-shaped highly sensitive operate at night gray-scale vision Two types of light-sensitive receptors

The Microscopic View

Rods and Cones in the Retina

What Rods and Cones Detect Notice how they aren’t distributed evenly, and the rod is more sensitive to shorter wavelengths

© Stephen E. Palmer, 2002 Three kinds of cones: Absorption spectra Implementation of Trichromatic theory Physiology of Color Vision Opponent Processes: R/G = L-M G/R = M-L B/Y = S-(M+L) Y/B = (M+L)-S

© Stephen E. Palmer, 2002 Double Opponent Cells in V1 Physiology of Color Vision G+R-G+R- G+R-G+R- R+G-R+G- R+G-R+G- Red/Green Y+B-Y+B- Y+B-Y+B- B+Y-B+Y- B+Y-B+Y- Blue/Yellow

Color Blindness Not everybody perceives colors in the same way! What numbers do you see in these displays? © Stephen E. Palmer, 2002

Theories of Color Vision A Dual Process Wiring Diagram © Stephen E. Palmer, 2002 Trichromatic Stage Opponent Process Stage

Color Naming © Stephen E. Palmer, 2002 Basic Color Terms (Berlin & Kay) Criteria: 1. Single words -- not “light-blue” or “blue-green” 2. Frequently used -- not “mauve” or “cyan” 3. Refer primarily to colors -- not “lime” or “gold” 4. Apply to any object -- not “roan” or “blond”

Color Naming © Stephen E. Palmer, 2002 BCTs in English Red Green Blue Yellow Black White Gray Brown Purple Orange* Pink

Color Naming © Stephen E. Palmer, 2002 Five more BCTs in a study of 98 languages Light-Blue Warm Cool Light-Warm Dark-Cool

The WCS Color Chips Basic color terms: –Single word (not blue-green) –Frequently used (not mauve) –Refers primarily to colors (not lime) –Applies to any object (not blonde) FYI: English has 11 basic color terms

Results of Kay’s Color Study If you group languages into the number of basic color terms they have, as the number of color terms increases, additional terms specify focal colors Stage IIIIIIa / IIIbIVVVIVII W or R or YWWWWWW Bk or G or BuR or Y RRRR Bk or G or BuG or BuYYYY BkG or BuGGG BkBu WBk RY+Bk (Brown) YR+W (Pink) Bk or G or BuR + Bu (Purple) R+Y (Orange) B+W (Grey)

Color Naming © Stephen E. Palmer, 2002 Typical “developmental” sequence of BCTs

Color Naming © Stephen E. Palmer, 2002 Studied color categories in two ways Boundaries Best examples (Berlin & Kay)

Color Naming © Stephen E. Palmer, 2002 MEMORY : Focal colors are remembered better than nonfocal colors. LEARNING: New color categories centered on focal colors are learned faster. Categorization: Focal colors are categorized more quickly than nonfocal colors. (Rosch)

Color Naming Degree of Membership Fuzzy set theory (Zadeh) A fuzzy logical model of color naming (Kay & Mc Daniel) © Stephen E. Palmer, 2002

Color Naming © Stephen E. Palmer, 2002 “Primary” color categories

Color Naming © Stephen E. Palmer, 2002 “Primary” color categories Red Green Blue Yellow Black White

Color Naming © Stephen E. Palmer, 2002 “Derived” color categories Fuzzy logical “AND f ”

Color Naming © Stephen E. Palmer, 2002 “Derived” color categories Orange = Red AND f Yellow Purple = Red AND f Blue Gray = Black AND f White Pink = Red AND f White Brown = Yellow AND f Black (Goluboi = Blue AND f White)

Color Naming © Stephen E. Palmer, 2002 “Composite” color categories Fuzzy logical “OR f ” Warm = Red Or f Yellow Cool = Blue Or f Green Light-warm = White Or f Warm Dark-cool = Black Or f Cool

Color Naming © Stephen E. Palmer, 2002