Computer Science Department

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Slides from: Doug Gray, David Poole
Introduction to Neural Networks Computing
Perceptron Learning Rule
2806 Neural Computation Self-Organizing Maps Lecture Ari Visa.
Support Vector Machines
Supervised Learning Recap
5/16/2015Intelligent Systems and Soft Computing1 Introduction Introduction Hebbian learning Hebbian learning Generalised Hebbian learning algorithm Generalised.
Artificial neural networks:
Kostas Kontogiannis E&CE
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Simple Neural Nets For Pattern Classification
x – independent variable (input)
Slides are based on Negnevitsky, Pearson Education, Lecture 8 Artificial neural networks: Unsupervised learning n Introduction n Hebbian learning.
Correlation Matrix Memory CS/CMPE 333 – Neural Networks.
Learning Process CS/CMPE 537 – Neural Networks. CS/CMPE Neural Networks (Sp 2004/2005) - Asim LUMS2 Learning Learning…? Learning is a process.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Un Supervised Learning & Self Organizing Maps Learning From Examples
2806 Neural Computation Learning Processes Lecture Ari Visa.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
MAE 552 Heuristic Optimization Instructor: John Eddy Lecture #31 4/17/02 Neural Networks.
CS 4700: Foundations of Artificial Intelligence
Lecture 09 Clustering-based Learning
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Supervised Hebbian Learning
Radial Basis Function Networks
Learning Processes.
Principles of Pattern Recognition
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
Artificial Neural Network Unsupervised Learning
Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Artificial Neural Network Supervised Learning دكترمحسن كاهاني
NEURAL NETWORKS FOR DATA MINING
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Neural Networks 2nd Edition Simon Haykin
Neural Network Basics Anns are analytical systems that address problems whose solutions have not been explicitly formulated Structure in which multiple.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
CHAPTER 10 Widrow-Hoff Learning Ming-Feng Yeh.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Supervised Learning. Teacher response: Emulation. Error: y1 – y2, where y1 is teacher response ( desired response, y2 is actual response ). Aim: To reduce.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall Perceptron Rule and Convergence Proof Capacity.
EEE502 Pattern Recognition
Chapter 6 Neural Network.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Computacion Inteligente Least-Square Methods for System Identification.
Supervised Learning – Network is presented with the input and the desired output. – Uses a set of inputs for which the desired outputs results / classes.
Lecture 2 Introduction to Neural Networks and Fuzzy Logic President UniversityErwin SitompulNNFL 2/1 Dr.-Ing. Erwin Sitompul President University
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Self-Organizing Network Model (SOM) Session 11
Deep Feedforward Networks
Neural Networks.
Real Neurons Cell structures Cell body Dendrites Axon
Biological and Artificial Neuron
Biological and Artificial Neuron
Biological and Artificial Neuron
Boltzmann Machine (BM) (§6.4)
Artificial Neural Network Chapter – 2, Learning Processes
Presentation transcript:

Computer Science Department Learning Processes CS679 Lecture Note by Jin Hyung Kim Computer Science Department KAIST

Learning Learning is a process by which free parameters of NN are adapted thru stimulation from environment Sequence of Events stimulated by an environment undergoes changes in its free parameters responds in a new way to the environment Learning Algorithm prescribed steps of process to make a system learn ways to adjust synaptic weight of a neuron No unique learning algorithms - kit of tools The Chapter covers five learning rules, learning paradigms, issues of learning task probabilistic and statistical aspect of learning

Error Correction Learning(I) Error signal, ek(n) ek(n) = dk(n) - yk(n) where n denotes time step Error signal activates a control mechanism for corrective adjustment of synaptic weights Mininizing a cost function, E(n), or index of performance Also called instantaneous value of error energy step-by-step adjustment until system reaches steady state; synaptic weights are stabilized Also called deltra rule, Widrow-Hoff rule

Error Correction Learning(II) wkj(n) = ek(n)xj(n)  : rate of learning; learning-rate parameter wkj(n+1) = wkj(n) + wkj(n) wkj(n) = Z-1[wkj(n+1) ] Z-1 is unit-delay operator adjustment is proportioned to the product of error signal and input signal error-correction learning is local The learning rate  determines the stability or convergence

Memory-based Learning Past experiences are stored in memory of correctly classified input-output examples retrieve and analyze “local neighborhood” Essential Ingredient Criterion used for defining local neighbor Learning rule applied to the training examples Nearest Neighbor Rule (NNR) the vector X’n  { X1, X2, …,XN } is the nearest neighbor of Xtest if X’n is the class of Xtest

Nearest Neighbor Rule Cover and Hart K-nearest Neighbor rule Examples are independent and identically distributed The sample size N is infinitely large then, error(NNR) < 2 * error(Bayesian rule) Half of the information is contained in the Nearest Neighbor K-nearest Neighbor rule variant of NNR Select k-nearest neighbors of Xtest and use a majority vote act like averaging device discriminate against outlier Radial-basis function network is a memory-based classifier

Hebbian Learning If two neurons of a connection are activated simultaneously (synchronously), then its strength is increased asynchronously, then the strength is weakened or eliminated Hebbian synapse time dependent depend on exact time of occurrence of two signals local locally available information is used interactive mechanism learning is done by two signal interaction conjunctional or correlational mechanism cooccurrence of two signals Hebbian learning is found in Hippocampus presynaptic & postsynaptic signals

Math Model of Hebbian Modification wkj(n) = F(yk(n), xj(n)) Hebb’s hypothesis wkj(n) = yk(n)xj(n) where  is rate of learning also called activity product rule repeated application of xj leads to exponential growth Covariance hypothesis wkj(n) = (xj - x)(yk - y) 1. wkj is enhanced if xj > x and yk > y 2. wkj is depressed if (xj > x and yk < y ) or (xj < x and yk > y ) where x and y are time-average

Competitive Learning Output neurons of NN compete to became active Only single neuron is active at any one time salient feature for pattern classification Basic Elements A set of neurons that are all same except synaptic weight distribution responds differently to a given set of input pattern A Limits on the strength of each neuron A mechanism to compete to respond to a given input winner-takes-all Neurons learn to respond specialized conditions become feature detectors

Competitive NN Feedforward connection is excitatory Lateral is inhibitory - lateral inhibition layer of source Input Single layer of output neurons

Competitive Learning Output signal if vk > vj for all j, j  k otherwise Wkj denotes the synaptic weight Competitive Learning Rule If neuron does not respond to a particular input, no learning takes place

Competitive Learning x has some constant Euclidean length and perform clustering thru competitive learning

Boltzman Learning Rooted from statistical mechanics Boltzman Machine : NN on the basis of Boltzman learning The neurons constitute a recurrent structure operate in binary manner: “on”: +1 and “off”: -1 Visible neurons and hidden neurons energy function means no self feedback Goal of Boltzman Learning is to maximize likelihood function set of synaptic weights is called a model of the environment if it leads the same probability distribution of the states of visible units j  k j  k

Boltzman Machine Operation choosing a neuron at random, k, then flip the state of the neuron from state xk to state -xk with probability where is energy change resulting from such a flip If this rule applied repeatedly, the machine reaches thermal equilibrium. Two modes of operation Clamped condition : visible neurons are clamped onto specific states Free-running condition: all neurons are allowed to operate freely

Boltzman Learning Rule Let denote the correlation between the states of neurons j and k with its clamped condition Let denote the correlation between the states of neurons j and k with its free-running condition Boltzman Learning Rule (Hinton and Sejnowski 86) where  is a learning-rate

Credit-Assignment Problem Problems of assigning credit or blame for overall outcomes to each of internal decisions Two sub-problems assignment of credit to actions temporal credit assignment problem : time instance of action many actions are taken; which action is responsible the outcome assignment of credit to internal decisions structural credit assignment problem : internal structure of action multiple components are contributed; which one do best CAP occurs in Error-correction learning in multiple layer feed-forward NN How do we credit or blame for the actions of hidden neurons ?

Learning with Teacher Supervised learning Teacher has knowledge of environment to learn input and desired output pairs are given as a training set Parameters are adjusted based on error signal step-by-step System performance measure mean-square-error sum of squared errors over the training sample visualized as error surface with parameter as coordinates Move toward to a minimum point of error surface may not be a global minimum use gradient of error surface - direction of steepest descent Good for pattern recognition and function approximation

Learning without Teacher Reinforcement learning No teacher to provide direct (desired) response at each step example : good/bad, win/loose must solve temporal credit assignment problem since critics may be given at the time of final output Environment Critics Learning Systems Primary reinforcement Heuristic reinforcement

Unsupervised Learning Self-organized learning No external teacher or critics Task-independent measure of quality is required to learn Network parameters are optimized with respect to the measure competitive learning rule is a case of unsupervised learning

Learning Tasks Pattern Association Pattern Recognition Function Approximation Control Filtering Beamforming

Pattern Association Associative memory is distributed memory that learns by association predominant features of human memory xk  yk, k=1, 2,…, q Storage capacity, q Storage phase and Recall phase NN is required to store a set of patterns by repeatedly presenting partial description or distorted pattern is used to retrieve the particular pattern xk is act as a stimulus that determines location of memorized pattern Autoassociation : when xk = yk, Hetroassociation : when xk  yk, Have q as large as possible yet recall correctly

Pattern Recognition Process whereby a received pattern is assigned to one of prescribed classes (categories) Pattern recognition by NN is statistical in nature pattern is a point in a multidimensional decision space decision boundaries - region associated with a class decision boundaries are determined by training Feature Extraction Classification Input pattern Outout class Feature vector unsupervised supervised r-D m-D q-D

Function Approximation Given set of labeled examples drawn from , find which approximates unknown function s.t. Supervised learning is good for this task System identification construct an inverse system di Unknown system NN model  ei input yi

Control Feedback control system search of Jacobian matrix for error correction learning indirect learning using input-output measurement on the plant, construct NN model the NN model is used to estimate Jacobian direct learning Controller Plant  Reference signal d y e u

Filtering Extract information from a set of noisy data information processing tasks filtering using data measured up to and including time n smoothing using data measured up to and after time n prediction predict at n+n0 ( n0 > 0) using data measured up to time n Cocktail party problem focus on a speaker in the noisy environment believed that preattentive, preconscious analysis involved

Blind source separation x1(n) Unknown mixer A Demixer W u1(n) y1(n) ... ... ... um(n) ym(n) xm(n) Unknown environment

Memory Relatively enduring neural alterations induced by interaction with environment - neurobiological definition accessible to influence future behavior activity pattern is stored by learning process memory and learning are connected Short-term memory compilation of knowledge representing current state of environment Long-term memory knowledge stored for a long time or permanently

Associative memory Memory is distributed stimulus pattern and response pattern consist of data vectors information is stored as spatial patterns of neural activities information of stimulus pattern determines not only stage location but also address for its retrieval although neurons is not reliable, memory is there may be interaction between patterns stored. (not a simple storage of patterns) There is possibility of error in recall process

Correlation Matrix Memory Association of key vector xk with memorized vector yk yk = W(k)xk, k = 1, 2, 3, …, q Total experience Adding W(k) to Mk-1 loses distinct identity of mixture of contributions that form Mk but information about stimuli is not lost

Correlation Matrix memory Learning : Estimate of memory matrix M called Outer Product Rule generalization of Hebb’s postulate of learning

Recall Normalize to have unit matrix defined as Vj Noise Vector normalized to have unit matrix

Orthogonal set orthogonal set Storage Capacity rank : # of independent columns limited by dimensionality of input

Error of Associative Memory Real-life is Neither orthogonal nor highly separated Lower bound  if  is large, the memory may fail to distinguish memory act as having patterns of never seen before termed animal logic

Adaptation Space and time : fundamental dimension Stationary system spatiotemporal nature of learning Stationary system learned parameters are frozen to be used later Nonstationary system continually adapt its parameters to variations in the incoming signal in a real-time fashion continuous learning; learning-on-the-fly Psudostationary consider stationary for a short time window speech signal : 10~30 msec weather forcasting : period of minutes

Statistical Nature of Learning Process Training sample denoted by functional relationship between X and D : deterministic function of Random variable X expectation error random variable called regressive model Mean value is zero error is uncorrelated (principles of orthogonality)

Minimizing Cost Function E is averaging operator natural measure of the effectiveness since

Bias/Variance Dilemma represents inability of the NN to approximate the regression function variance inadequacy of the information contained in the training set  we can reduce both of bias and variance only with Large training samples purposely introduce “harmless” bias to reduce variance designed bias - constrained network takes prior knowledge

Statistical Learning Theory Address fundamental issue of controlling generalization ability Principles of empirical risk minimization empirical risk function uniform convergence As training progress, likelihood of those approximate functions F(xi, w), consistent with training data  is increased

Vapnik-Chervonenkis Dimension Measure the capacity or expressive power of the family of classification functions realized by the learning machine VC dimension, VCdim(), of an ensemble of dichotomies  is the cardinality of the largest set L that is shattered by  maximum number of training examples that can be shattered by the machine() without error example : 4 points in general position in 2D space # of dichotomies ((L)): 16 Vcdim(), max N where (L) = 2N : 4 evaluation of Vcdim(.) is defficult task, but its bounds are often tractable feedforward network with a threshold function : O(W logW) feedforward network with the sigmoid function : O(W2) feedforward network with pure linear function : O(W)

Confidence Interval A finite VCdim is a necessary and sufficient condition for the uniform convergence of the principle of empirical risk minimization Training Error vs Generalization Error

Guaranteed risk Guranteed risk (bound on generation error error) Training error VC dimension, h overdetermined underdetermined

Probably Approximately Correct Learning Model Goal I of PAC learning is to ensure that vtrain is usually small prob. of difference btn trained hypothesis and concept to learn Parameters size of training sample, N error parameter, , allowed in approximation of the target concept confidence parameters, , likelihood of approximation How many random example is needed to learn unknown concept within given VCdim h,  and  with constant K For H hypothesis