Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Science Department

Similar presentations


Presentation on theme: "Computer Science Department"— Presentation transcript:

1 Computer Science Department
Learning Processes CS679 Lecture Note by Jin Hyung Kim Computer Science Department KAIST

2 Learning Learning is a process by which free parameters of NN are adapted thru stimulation from environment Sequence of Events stimulated by an environment undergoes changes in its free parameters responds in a new way to the environment Learning Algorithm prescribed steps of process to make a system learn ways to adjust synaptic weight of a neuron No unique learning algorithms - kit of tools The Chapter covers five learning rules, learning paradigms, issues of learning task probabilistic and statistical aspect of learning

3 Error Correction Learning(I)
Error signal, ek(n) ek(n) = dk(n) - yk(n) where n denotes time step Error signal activates a control mechanism for corrective adjustment of synaptic weights Mininizing a cost function, E(n), or index of performance Also called instantaneous value of error energy step-by-step adjustment until system reaches steady state; synaptic weights are stabilized Also called deltra rule, Widrow-Hoff rule

4 Error Correction Learning(II)
wkj(n) = ek(n)xj(n)  : rate of learning; learning-rate parameter wkj(n+1) = wkj(n) + wkj(n) wkj(n) = Z-1[wkj(n+1) ] Z-1 is unit-delay operator adjustment is proportioned to the product of error signal and input signal error-correction learning is local The learning rate  determines the stability or convergence

5 Memory-based Learning
Past experiences are stored in memory of correctly classified input-output examples retrieve and analyze “local neighborhood” Essential Ingredient Criterion used for defining local neighbor Learning rule applied to the training examples Nearest Neighbor Rule (NNR) the vector X’n  { X1, X2, …,XN } is the nearest neighbor of Xtest if X’n is the class of Xtest

6 Nearest Neighbor Rule Cover and Hart K-nearest Neighbor rule
Examples are independent and identically distributed The sample size N is infinitely large then, error(NNR) < 2 * error(Bayesian rule) Half of the information is contained in the Nearest Neighbor K-nearest Neighbor rule variant of NNR Select k-nearest neighbors of Xtest and use a majority vote act like averaging device discriminate against outlier Radial-basis function network is a memory-based classifier

7 Hebbian Learning If two neurons of a connection are activated
simultaneously (synchronously), then its strength is increased asynchronously, then the strength is weakened or eliminated Hebbian synapse time dependent depend on exact time of occurrence of two signals local locally available information is used interactive mechanism learning is done by two signal interaction conjunctional or correlational mechanism cooccurrence of two signals Hebbian learning is found in Hippocampus presynaptic & postsynaptic signals

8 Math Model of Hebbian Modification
wkj(n) = F(yk(n), xj(n)) Hebb’s hypothesis wkj(n) = yk(n)xj(n) where  is rate of learning also called activity product rule repeated application of xj leads to exponential growth Covariance hypothesis wkj(n) = (xj - x)(yk - y) 1. wkj is enhanced if xj > x and yk > y 2. wkj is depressed if (xj > x and yk < y ) or (xj < x and yk > y ) where x and y are time-average

9 Competitive Learning Output neurons of NN compete to became active
Only single neuron is active at any one time salient feature for pattern classification Basic Elements A set of neurons that are all same except synaptic weight distribution responds differently to a given set of input pattern A Limits on the strength of each neuron A mechanism to compete to respond to a given input winner-takes-all Neurons learn to respond specialized conditions become feature detectors

10 Competitive NN Feedforward connection is excitatory Lateral
is inhibitory - lateral inhibition layer of source Input Single layer of output neurons

11 Competitive Learning Output signal if vk > vj for all j, j  k
otherwise Wkj denotes the synaptic weight Competitive Learning Rule If neuron does not respond to a particular input, no learning takes place

12 Competitive Learning x has some constant Euclidean length and
perform clustering thru competitive learning

13 Boltzman Learning Rooted from statistical mechanics
Boltzman Machine : NN on the basis of Boltzman learning The neurons constitute a recurrent structure operate in binary manner: “on”: +1 and “off”: -1 Visible neurons and hidden neurons energy function means no self feedback Goal of Boltzman Learning is to maximize likelihood function set of synaptic weights is called a model of the environment if it leads the same probability distribution of the states of visible units j  k j  k

14 Boltzman Machine Operation
choosing a neuron at random, k, then flip the state of the neuron from state xk to state -xk with probability where is energy change resulting from such a flip If this rule applied repeatedly, the machine reaches thermal equilibrium. Two modes of operation Clamped condition : visible neurons are clamped onto specific states Free-running condition: all neurons are allowed to operate freely

15 Boltzman Learning Rule
Let denote the correlation between the states of neurons j and k with its clamped condition Let denote the correlation between the states of neurons j and k with its free-running condition Boltzman Learning Rule (Hinton and Sejnowski 86) where  is a learning-rate

16 Credit-Assignment Problem
Problems of assigning credit or blame for overall outcomes to each of internal decisions Two sub-problems assignment of credit to actions temporal credit assignment problem : time instance of action many actions are taken; which action is responsible the outcome assignment of credit to internal decisions structural credit assignment problem : internal structure of action multiple components are contributed; which one do best CAP occurs in Error-correction learning in multiple layer feed-forward NN How do we credit or blame for the actions of hidden neurons ?

17 Learning with Teacher Supervised learning
Teacher has knowledge of environment to learn input and desired output pairs are given as a training set Parameters are adjusted based on error signal step-by-step System performance measure mean-square-error sum of squared errors over the training sample visualized as error surface with parameter as coordinates Move toward to a minimum point of error surface may not be a global minimum use gradient of error surface - direction of steepest descent Good for pattern recognition and function approximation

18 Learning without Teacher
Reinforcement learning No teacher to provide direct (desired) response at each step example : good/bad, win/loose must solve temporal credit assignment problem since critics may be given at the time of final output Environment Critics Learning Systems Primary reinforcement Heuristic reinforcement

19 Unsupervised Learning
Self-organized learning No external teacher or critics Task-independent measure of quality is required to learn Network parameters are optimized with respect to the measure competitive learning rule is a case of unsupervised learning

20 Learning Tasks Pattern Association Pattern Recognition
Function Approximation Control Filtering Beamforming

21 Pattern Association Associative memory is distributed memory that learns by association predominant features of human memory xk  yk, k=1, 2,…, q Storage capacity, q Storage phase and Recall phase NN is required to store a set of patterns by repeatedly presenting partial description or distorted pattern is used to retrieve the particular pattern xk is act as a stimulus that determines location of memorized pattern Autoassociation : when xk = yk, Hetroassociation : when xk  yk, Have q as large as possible yet recall correctly

22 Pattern Recognition Process whereby a received pattern is assigned to one of prescribed classes (categories) Pattern recognition by NN is statistical in nature pattern is a point in a multidimensional decision space decision boundaries - region associated with a class decision boundaries are determined by training Feature Extraction Classification Input pattern Outout class Feature vector unsupervised supervised r-D m-D q-D

23 Function Approximation
Given set of labeled examples drawn from , find which approximates unknown function s.t. Supervised learning is good for this task System identification construct an inverse system di Unknown system NN model ei input yi

24 Control Feedback control system search of Jacobian matrix
for error correction learning indirect learning using input-output measurement on the plant, construct NN model the NN model is used to estimate Jacobian direct learning Controller Plant Reference signal d y e u

25 Filtering Extract information from a set of noisy data
information processing tasks filtering using data measured up to and including time n smoothing using data measured up to and after time n prediction predict at n+n0 ( n0 > 0) using data measured up to time n Cocktail party problem focus on a speaker in the noisy environment believed that preattentive, preconscious analysis involved

26 Blind source separation
x1(n) Unknown mixer A Demixer W u1(n) y1(n) ... ... ... um(n) ym(n) xm(n) Unknown environment

27 Memory Relatively enduring neural alterations induced by interaction with environment - neurobiological definition accessible to influence future behavior activity pattern is stored by learning process memory and learning are connected Short-term memory compilation of knowledge representing current state of environment Long-term memory knowledge stored for a long time or permanently

28 Associative memory Memory is distributed
stimulus pattern and response pattern consist of data vectors information is stored as spatial patterns of neural activities information of stimulus pattern determines not only stage location but also address for its retrieval although neurons is not reliable, memory is there may be interaction between patterns stored. (not a simple storage of patterns) There is possibility of error in recall process

29 Correlation Matrix Memory
Association of key vector xk with memorized vector yk yk = W(k)xk, k = 1, 2, 3, …, q Total experience Adding W(k) to Mk-1 loses distinct identity of mixture of contributions that form Mk but information about stimuli is not lost

30 Correlation Matrix memory
Learning : Estimate of memory matrix M called Outer Product Rule generalization of Hebb’s postulate of learning

31 Recall Normalize to have unit matrix defined as Vj Noise Vector
normalized to have unit matrix

32 Orthogonal set orthogonal set Storage Capacity
rank : # of independent columns limited by dimensionality of input

33 Error of Associative Memory
Real-life is Neither orthogonal nor highly separated Lower bound  if  is large, the memory may fail to distinguish memory act as having patterns of never seen before termed animal logic

34 Adaptation Space and time : fundamental dimension Stationary system
spatiotemporal nature of learning Stationary system learned parameters are frozen to be used later Nonstationary system continually adapt its parameters to variations in the incoming signal in a real-time fashion continuous learning; learning-on-the-fly Psudostationary consider stationary for a short time window speech signal : 10~30 msec weather forcasting : period of minutes

35 Statistical Nature of Learning Process
Training sample denoted by functional relationship between X and D : deterministic function of Random variable X expectation error random variable called regressive model Mean value is zero error is uncorrelated (principles of orthogonality)

36 Minimizing Cost Function
E is averaging operator natural measure of the effectiveness since

37 Bias/Variance Dilemma
represents inability of the NN to approximate the regression function variance inadequacy of the information contained in the training set  we can reduce both of bias and variance only with Large training samples purposely introduce “harmless” bias to reduce variance designed bias - constrained network takes prior knowledge

38 Statistical Learning Theory
Address fundamental issue of controlling generalization ability Principles of empirical risk minimization empirical risk function uniform convergence As training progress, likelihood of those approximate functions F(xi, w), consistent with training data  is increased

39 Vapnik-Chervonenkis Dimension
Measure the capacity or expressive power of the family of classification functions realized by the learning machine VC dimension, VCdim(), of an ensemble of dichotomies  is the cardinality of the largest set L that is shattered by  maximum number of training examples that can be shattered by the machine() without error example : 4 points in general position in 2D space # of dichotomies ((L)): 16 Vcdim(), max N where (L) = 2N : 4 evaluation of Vcdim(.) is defficult task, but its bounds are often tractable feedforward network with a threshold function : O(W logW) feedforward network with the sigmoid function : O(W2) feedforward network with pure linear function : O(W)

40 Confidence Interval A finite VCdim is a necessary and sufficient condition for the uniform convergence of the principle of empirical risk minimization Training Error vs Generalization Error

41 Guaranteed risk Guranteed risk (bound on generation error error)
Training error VC dimension, h overdetermined underdetermined

42 Probably Approximately Correct Learning Model
Goal I of PAC learning is to ensure that vtrain is usually small prob. of difference btn trained hypothesis and concept to learn Parameters size of training sample, N error parameter, , allowed in approximation of the target concept confidence parameters, , likelihood of approximation How many random example is needed to learn unknown concept within given VCdim h,  and  with constant K For H hypothesis


Download ppt "Computer Science Department"

Similar presentations


Ads by Google