2806 Neural Computation Learning Processes Lecture 2 2005 Ari Visa.

Slides:



Advertisements
Similar presentations
© Negnevitsky, Pearson Education, Introduction Introduction Hebbian learning Hebbian learning Generalised Hebbian learning algorithm Generalised.
Advertisements

Introduction to Neural Networks Computing
2806 Neural Computation Self-Organizing Maps Lecture Ari Visa.
Ming-Feng Yeh1 CHAPTER 13 Associative Learning. Ming-Feng Yeh2 Objectives The neural networks, trained in a supervised manner, require a target signal.
5/16/2015Intelligent Systems and Soft Computing1 Introduction Introduction Hebbian learning Hebbian learning Generalised Hebbian learning algorithm Generalised.
Artificial neural networks:
Kostas Kontogiannis E&CE
Artificial Neural Networks - Introduction -
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Computer Science Department
Machine Learning Neural Networks
Neural NetworksNN 11 Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Simple Neural Nets For Pattern Classification
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Slides are based on Negnevitsky, Pearson Education, Lecture 8 Artificial neural networks: Unsupervised learning n Introduction n Hebbian learning.
Correlation Matrix Memory CS/CMPE 333 – Neural Networks.
Learning Process CS/CMPE 537 – Neural Networks. CS/CMPE Neural Networks (Sp 2004/2005) - Asim LUMS2 Learning Learning…? Learning is a process.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Associative Learning.
S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks / Fall 2004 Shreekanth Mandayam ECE Department Rowan University.
Artificial Neurons, Neural Networks and Architectures
2806 Neural Computation Support Vector Machines Lecture Ari Visa.
MAE 552 Heuristic Optimization Instructor: John Eddy Lecture #31 4/17/02 Neural Networks.
Lecture 09 Clustering-based Learning
S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks ECE /ECE Fall 2010 Shreekanth Mandayam ECE Department Rowan University.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Supervised Hebbian Learning
Learning Processes.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
Introduction to Neural Networks Debrup Chakraborty Pattern Recognition and Machine Learning 2006.
Artificial Neural Network Unsupervised Learning
Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Artificial Neural Network Supervised Learning دكترمحسن كاهاني
NEURAL NETWORKS FOR DATA MINING
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Neural Networks and Fuzzy Systems Hopfield Network A feedback neural network has feedback loops from its outputs to its inputs. The presence of such loops.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Neural Networks 2nd Edition Simon Haykin
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Lecture 5 Neural Control
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Supervised Learning. Teacher response: Emulation. Error: y1 – y2, where y1 is teacher response ( desired response, y2 is actual response ). Aim: To reduce.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall Perceptron Rule and Convergence Proof Capacity.
EEE502 Pattern Recognition
Chapter 8: Adaptive Networks
Chapter 6 Neural Network.
Lecture 12. Outline of Rule-Based Classification 1. Overview of ANN 2. Basic Feedforward ANN 3. Linear Perceptron Algorithm 4. Nonlinear and Multilayer.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Supervised Learning – Network is presented with the input and the desired output. – Uses a set of inputs for which the desired outputs results / classes.
Lecture 2 Introduction to Neural Networks and Fuzzy Logic President UniversityErwin SitompulNNFL 2/1 Dr.-Ing. Erwin Sitompul President University
J. Kubalík, Gerstner Laboratory for Intelligent Decision Making and Control Artificial Neural Networks II - Outline Cascade Nets and Cascade-Correlation.
CS 9633 Machine Learning Support Vector Machines
Self-Organizing Network Model (SOM) Session 11
Synaptic Dynamics: Unsupervised Learning
Biological and Artificial Neuron
Biological and Artificial Neuron
Biological and Artificial Neuron
EE513 Audio Signals and Systems
Artificial Neural Network Chapter – 2, Learning Processes
Presentation transcript:

2806 Neural Computation Learning Processes Lecture Ari Visa

Agenda n Some historical notes n Learning n Five basic learning rules n Learning paradigms n The issues of learning tasks n Probabilistic and statistical aspects of the learning process n Conclusion

Overview What is meant with learning? The ability of the neural network (NN) to learn from its environment and to improve its performance through learning. - The NN is stimulated by an environment - The NN undergoes changes in its free parameteres - The NN responds in a new way to the environment

Some historical notes Pavlov’s conditioning experiments: a conditioned response, salivation in response to the auditory stimulus Hebb: The Organization of Behavior, > Long-Term Potential, LPT, (1973 Bliss,Lomo), AMPA receptor, Long-Term Depression, LTD, NMDA receptor, The nearest neigbbor rule Fix&Hodges 1951

Some historical notes n The idea of competive learning: von der Malsburg 1973, the self-organization of orientation-sensitive nerve cells in the striate cortex n Lateral inhibition ->Mach bands, Ernest Mach 1865 n Statistical thermodynamics in the study of computing machinery, John von Neumann, Theory and Organization of Complicated Automata, 1949

Some historical notes n Reinforcement learning: Minsky 1961, Thorndike 1911 n The problem of designing an optimum linear filter: Kolmogorov 1942, Wiener 1949, Zadeh 1953, Gabor 1954

Definition of Learning n Learning is a process by which the free parameters of a neural network are adapted through a process of stimulation by the environment in which the network is embedded. The type of the learning is determined by the manner in which the parameter changes take place. (Mendel & McClaren 1970)

Five Basic Learning Rules n Error-correction learning <- optimum filtering n Memory-based learning <- memorizing the training data explicitly n Hebbian learning <- neurobiological n Competitive learning <- neurobiological n Boltzmann learning <- statistical mechanics

Five Basic Learning Rules 1/5 n Error-Correction Learning n error signal = desired response – output signal n e k (n) = d k (n) –y k (n) n e k (n) actuates a control mechanism to make the output signal y k (n) come closer to the desired response d k (n) in step by step manner

Five Basic Learning Rules 1/5 n A cost function  (n) = ½e² k (n) is the instantaneous value of the error energy -> a steady state n = a delta rule or Widrow-Hoff rule n  w kj (n) =  e k (n) x j (n), n  is the learning rate parameter n The adjustment made to a synaptic weight of a neuron is proportional to the product of the error signal and the input signal of the synapse in question. n w kj (n+1) = w kj (n) +  w kj (n)

Five Basic Learning Rules 2/5 n Memory-Based Learning:all of the past experiences are explicitly stored in a large memory of correctly classified input-output examples n {(x i,d i )} N i=1

Five Basic Learning Rules 2/5 n Criterion used for defining the local neighbourhood of the test vector x test. n Learning rule applied to the training examples in the local neighborhood of x test. n Nearest neighbor rule: the vector x’ N  {x 1,x 2,...,x N } is the nearest neighbor of x test if min i d(x i, x test ) = d(x’ N, x test )

Five Basic Learning Rules 2/5 n If the classified examples d(x i, d i ) are independently and identically distributed according to the joint probability distribution of the example (x,d). n If the sample size N is infinitely large. n The classification error incurred by the nearest neighbor rule is bounded above twice the Bayes probability of error.

Five Basic Learning Rules 2/5 n k-nearest neighbor classifier: n Identify the k classified patterns that lie nearest to the test vector x test for some integer k. n Assign x test to the class that is most frequently represented in the k nearest neighbors to x test.

Five Basic Learning Rules 3/5 n Hebbian Learning: n 1. If two neurons on either side of synapse (connection) are activated simultaneously, then the strength of that synapse is selectively increased. n 2. If two neurons on either side of a synapse are activated asynchronously, then that synapse is selectively weakened or eliminated.

Five Basic Learning Rules 3/5 n 1. Time-dependent mechanism n 2. Local mechanism (spatiotemporal contiguity) n 3. Interactive mechanism n 4. Conjunctional or correlational mechanism n ->A Hebbian synapse increases its strength with positively correlated presynaptic and postsynaptic signals, and decreases its strength when signals are either uncorrelated or negatively correlated.

Five Basic Learning Rules 3/5 n The Hebbian learning in matematical terms: n  w kj (n)=F(y k (n),x j (n)) n The simplest form: n  w kj (n) =  y k (n)x j (n) n Covariance hypothesis: n  w kj =  (x j -  x)(y j -  y)

Five Basic Learning Rules 3/5 n Note, that: n 1. Synaptic weight w kj is enhanced if the conditions x j >  x and y k >  y are both satisfied. n 2. Synaptic weight w kj is depressed if there is x j >  x and y k <  y or n y k >  y and x j <  x.

Five Basic Learning Rules 4/5 n Competitive Learning: n The output neurons of a neural network compete among themselves to become active. n - a set of neurons that are all the same (excepts for synaptic weights) n - a limit imposed on the strength of each neuron n - a mechanism that permits the neurons to compete -> a winner- takes-all

Five Basic Learning Rules 4/5 n The standard competitive learning rule n  w kj =  (x j -w kj ) if neuron k wins the competition = 0 if neuron k loses the competition n Note. all the neurons in the network are constrained to have the same length.

Five Basic Learning Rules 5/5 n Boltzmann Learning: n The neurons constitute a recurrent structure and they operate in a binary manner. The machine is characterized by an energy function E. n E = -½  j  k w kj x k x j, j  k n Machine operates by choosing a neuron at random then flipping the state of neuron k from state x k to state –x k at some temperature T with probability n P(x k  - x k ) = 1/(1+exp(-  E k /T))

Five Basic Learning Rules 5/5 Clamped condition: the visible neurons are all clamped onto specific states determined by the environment Free-running condition: all the neurons (=visible and hidden) are allowed to operate freely n The Boltzmann learning rule: n  w kj =  (  + kj -  - kj ), j  k, n note that both  + kj and  - kj range in value from –1 to +1.

Learning Paradigms n Credit assignment: The credit assigment problem is the problem of assigning credit or blame for overall outcomes to each of the internal decisions made by the learning machine and which contributed to those outcomes. n 1. The temporal credit- assignment problem in that it involves the instants of time when the actions that deserve credit were actually taken. n 2. The structural credit- assignment problem in that it involves assigning credit to the internal structures of actions generated by thesystem.

Learning Paradigms n Learning with a Teacher (=supervised learning) n The teacher has knowledge of the environment n Error-performance surface

Learning Paradigms n Learning without a Teacher: no labeled examples available of the function to be learned. n 1) Reinforcement learning n 2) Unsupervised learning

Learning Paradigms n 1) Reinforcement learning: The learning of input-output mapping is performed through continued interaction with the environment in oder to minimize a scalar index of performance.

Learning Paradigms n Delayed reinforcement, which means that the system observes a temporal sequence of stimuli. n Difficult to perform for two reasons: n - There is no teacher to provide a desired response at each step of the learning process. n - The delay incurred in the generation of the primary reinforcement signal implies that the machine must solve a temporal credit assignment problem. n Reinforcement learning is closely related to dynamic programming.

Learning Paradigms n Unsupervised Learning: There is no external teacher or critic to oversee the learning process. n The provision is made for a task independent measure of the quality of representation that the network is required to learn.

The Issues of Learning Tasks n An associative memory is a brainlike distributed memory that learns by association. n Autoassociation: A neural network is required to store a set of patterns by repeatedly presenting then to the network. The network is presented a partial description of an originalpattern stored in it, and the task is to retrieve that particular pattern. n Heteroassociation: It differs from autoassociation in that an arbitary set of input patterns is paired with another arbitary set of output patterns.

The Issues of Learning Tasks n Let x k denote a key pattern and y k denote a memorized pattern. The pattern association is decribed by n x k  y k, k = 1,2,...,q n In an autoassociative memory x k = y k n In a heteroassociative memory x k  y k. n Storage phase n Recall phase n q is a direct measure of the storage capacity.

The Issues of Learning Tasks n Pattern Recognition: The process whereby a received pattern/signal is assigned to one of a prescribed number of classes

The Issues of Learning Tasks Function Approximation: Consider a nonlinear input- output mapping d =f(x) The vector x is the input and the vector d is the output. The function f(.) is assumed to be unknown. The requirement is todesign a neural network that approximates the unknown function f(.).  F(x)-f(x)  for all x n System identification n Inverse system

The Issues of Learning Tasks n Control: The controller has to invert the plant’s input- output behavior. n Indirect learning n Direct learning

The Issues of Learning Tasks n Filtering n Smoothing n Prediction n Coctail party problem -> blind signal separation

The Issues of Learning Tasks n Beamforming: used in radar and sonar systems where the primary target is to detect and track a target.

The Issues of Learning Tasks n Memory: associative memory models n Correlation Matrix Memory

The Issues of Learning Tasks n Adaptation: It is desirable for a neural network to continually adapt its free parameters to variations in the incoming signals in a real-time fashion. n Pseudostationary over a window of short enough duration. n Continual training with time-ordered examples.

Probabilistic and Statistical Aspects of the Learning Process n We do not have knowledge of the exact functional relationship between X and D -> n D = f(X) + , a regressive model n The mean value of the expectational error , given any realization of X, is zero. n The expectational error  is uncorrelated with the regression function f(X).

Probabilistic and Statistical Aspects of the Learning Process n Bias/Variance Dilemma n L av (f(x),F(x,T)) = B²(w)+V(w) n B(w) = E T [F(x,T)]- E[D|X=x] (an approximation error) n V(w) = E T [(F(x,T)- E T [F(x,T)])² ] (an estimation error) n NN -> small bias and large variance n Introduce bias -> reduce variance

Probabilistic and Statistical Aspects of the Learning Process Vapnic-Chervonenkis dimension is a measure of the capacity or expressive power of the family of classification functions realized by the learning machine. VC dimension of T is the largest N such that  T (N) = 2 N. The VC dimension of the set of classification functions is the maximum number of training examples that can be learned by the machine without error for all possible binary labelings of the classification functions.

Probabilistic and Statistical Aspects of the Learning Process n Let N denote an arbitary feedforward network built up from neurons with a threshold (Heaviside) activation function. The VC dimension of N is O(WlogW) where W is the total number of free parameters in the network. n Let N denote a multilayer feedforward network whose neurons use a sigmoid activation function n f(v)=1/(1+exp(- v)). n The VC dimension of N is O(W²) where W is the total number of free parameters in the network

Probabilistic and Statistical Aspects of the Learning Process n The method of structural risk minimization n v guarant (w) = v train (w) +  1 (N,h, ,v train )

Probabilistic and Statistical Aspects of the Learning Process n The probably approximately correct (PAC) n 1. Any consistent learning algorithm for that neural network is a PAC learning algorithm. n 2. There is a constant K such that a sufficient size of training set T for any such algorithm is n N = K/  (h log(1/  ) + log(1/  )) n where  is the error paramater and  is the confidence parameter.

Summary n The five learning rules: Error-correction learning, Memory-based learning, Hebbian learning, Competitive learning and Boltzmann learning n Statistical and probabilistic aspects of learning