Pattern Associators, Generalization, Processing Psych 85-419/719 Feb 6, 2001.

Slides:



Advertisements
Similar presentations
Artificial Intelligence 12. Two Layer ANNs
Advertisements

Slides from: Doug Gray, David Poole
Introduction to Neural Networks Computing
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
The loss function, the normal equation,
x – independent variable (input)
Back-Propagation Algorithm
Artificial Neural Networks
Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.
Artificial Neural Networks
Image Compression Using Neural Networks Vishal Agrawal (Y6541) Nandan Dubey (Y6279)
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Separate multivariate observations
Neural Networks Lecture 8: Two simple learning algorithms
6.837 Linear Algebra Review Patrick Nichols Thursday, September 18, 2003.
Where We’re At Three learning rules  Hebbian learning regression  LMS (delta rule) regression  Perceptron classification.
Artificial Neural Networks
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
Multiple-Layer Networks and Backpropagation Algorithms
Artificial Neural Networks
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10
Machine Learning Chapter 4. Artificial Neural Networks
Classification / Regression Neural Networks 2
The Boltzmann Machine Psych 419/719 March 1, 2001.
Back Propagation and Representation in PDP Networks Psychology 209 February 6, 2013.
7 1 Supervised Hebbian Learning. 7 2 Hebb’s Postulate “When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.
ECON 338/ENVR 305 CLICKER QUESTIONS Statistics – Question Set #8 (from Chapter 10)
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Framework For PDP Models Psych /719 Jan 18, 2001.
EEE502 Pattern Recognition
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
COSC 4426 AJ Boulay Julia Johnson Artificial Neural Networks: Introduction to Soft Computing (Textbook)
Neural Networks 2nd Edition Simon Haykin
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Multinomial Regression and the Softmax Activation Function Gary Cottrell.
Machine Learning Supervised Learning Classification and Regression
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.
Chapter 7. Classification and Prediction
Chapter 11 Chi-Square Tests.
第 3 章 神经网络.
A Simple Artificial Neuron
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Lecture 03: Linear Algebra
Classification / Regression Neural Networks 2
Simple learning in connectionist networks
CSE 473 Introduction to Artificial Intelligence Neural Networks
Machine Learning Today: Reading: Maria Florina Balcan
ECE 471/571 - Lecture 17 Back Propagation.
Simple Learning: Hebbian Learning and the Delta Rule
CS 4501: Introduction to Computer Vision Training Neural Networks II
Artificial Intelligence Chapter 3 Neural Networks
Neural Networks Chapter 5
Backpropagation.
Chapter 11 Chi-Square Tests.
Artificial Intelligence Chapter 3 Neural Networks
Backpropagation.
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Simple learning in connectionist networks
Chapter 11 Chi-Square Tests.
The McCullough-Pitts Neuron
Supervised Hebbian Learning
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

Pattern Associators, Generalization, Processing Psych /719 Feb 6, 2001

A Pattern Associator Consists of a set of input units, and output units, and connections from input to output... And a training set of examples, consisting of inputs and their corresponding outputs

Simple Generalization In Hebbian Learning

The Dot Product The sum of the products of elements of two vectors When normalized for length, is basically the correlation between the vectors The angle between the vectors is the inverse cosine of the dot product When the dot product is 0 (or, angle is 90 degrees), vectors are orthogonal

Geometrically... (1,-1) (1,1)

So, Generalization in Hebb... After single learning trial, generalization is proportional to: –Output from trained trial, and –Correlation between new test input and learned input

After Multiple Training Trials.. Output of a test pattern is a function of the sum of all dot products between test input and all training input patterns, multiplied by the output that each test pattern produced.

Properties of Hebb Generalization If input is uncorrelated with all training inputs, output is zero Otherwise, is weighted average of all outputs from all training trials –Weighted by correlations with inputs on training trials If all training trials orthogonal to each other, no cross-talk

Cross-Talk in Delta Rule Learning Suppose we learn a given pattern in the delta rule What happens when we present a test pattern that is similar to that learned pattern? Difference in output is a function of error on the learned pattern, and dot product of learned input and test input

What Does This Mean? When our new item is similar to what we’ve been trained on, learning is easier if the output we want is close to the output we get from other examples. So, regular items (ones that have similar input-output relationships) don’t need a lot of training Exceptions need more training.

Frequency of Regulars and Exceptions

Constraints on Learning With Hebb rule, each training input needs to be orthogonal to every other one in order to be separable and avoid cross-talk With delta rule, the inputs just have to be linearly independent from each other to prevent one training trial from wrecking what was learned on other trials –Linearly independent: can’t produce vector A by multiplying vector B by a scalar

More Geometry... (1,-1) (1,1) Orthogonal vectors (-.5,1) (-1,.5) Linearly independent, but not orthogonal

Different Types of Activation Functions Linear: output of a unit is simply the summed input to it Linear threshold: output of a unit is summed input, but not above or below a threshold Stochastic: Roll dice as to what output is based on input Sigmoid: 1/(1+exp(-net))

Delta Rule for Non-Linear Units For linear units, Equals 1. Otherwise, it’s the derivative of our activation function f

So Delta Rule Works Well For Any Activation Function f that is differentiable Linear: easily differentiable Sigmoid: easily differentiable Threshold…. Not so much so. (what about other error functions besides sum-squared?)

Minimizing Sum Squared Error With unambiguous input, will converge to correct output With ambiguous or noisy input, will converge to output that minimizes average squared distance from all targets –This is effectively regression! Can read outputs as a probability distribution (recall IA Reading Model)

Regression vs. Winner-Take-All In Jets and Sharks model, activating gang node activated “winner” in the age group –Other ages suppressed In delta rule, output is proportional to the statistics of the training set Which is better?

The Ideas From Ch 11 We can think of patterns being correlated over units, rather than units correlated over patterns Same with targets Based on this, we can see how much cross talk there is between inputs, or weights, or outputs

As Learning Progresses, Weights Become Aligned With Targets

Performance Measures Sum squared error –tss is total sum squared error –pss is sum squared error for the current pattern Can also compute vector differences between actual output and target output –ndp is normalized dot product. –nvl is normalized vector length. Magnitude of output vector. –vcor is the correlation, ignoring magnitude

Unpacking... Suppose our targets were -1,1,-1 and our output was -0.5,0.5,-0.5 vcor, the correlation ignoring length, is perfect (1.0) Length (nvl) is less than 1; output is not at full magnitude So, overall performance (ndp) is not 1.

Back to Generalization Two layer delta rule networks are great for picking up on regularities Can’t do XOR problems Recall: regular and exception items (GAVE, WAVE, PAVE… HAVE) Are exceptions a form of XOR?

XOR and Exceptions Depends on your representation. With localist word units (for example), they are linearly independent, and hence learnable. … but you don’t get decent generalization with localist representations! This state of affairs led many to conclude that there were two systems for learning regulars and exceptions

Evidence for Two Systems Phonological dyslexics: impaired at rule application, more or less ok at exceptions Surface dyslexics: ok at rule application, poor at exceptions Conclusion of many: there are two systems. One performs rule association and learns rules. Other has localist word nodes. Handles exceptions.

History of the Argument When this two-system was put forward, it was not known how to train a network to handle XOR problems. Existing symbolic models also could pick up rules, but needed something else for exceptions. BUT: Starting next week, we’ll talk about learning rules that can handle the XOR problem.

The Zorzi et al. Model Pronunciation Word input “Lexical” Rep Two Layer Association. Delta Rule

For Thursday… Topic: Distributed Representations Read PDP1, Chapter 3. Optional: Handout, Plaut & McClelland, Stipulating versus discovering representations Optional: Science article Sparse population coding of faces in inferiotemporal cortex Look over homework #2