Presentation is loading. Please wait.

Presentation is loading. Please wait.

3.Learning In previous lecture, we discussed the biological foundations of of neural computation including  single neuron models  connecting single neuron.

Similar presentations


Presentation on theme: "3.Learning In previous lecture, we discussed the biological foundations of of neural computation including  single neuron models  connecting single neuron."— Presentation transcript:

1 3.Learning In previous lecture, we discussed the biological foundations of of neural computation including  single neuron models  connecting single neuron behaviour with network models  spiking neural networks  computational neuroscience

2 In present one, we introduce  Statistical foundations of neural computation = Artificial foundations of neural computation Artificial Neural Networks Biological foundations Artificial foundations (Neuroscience) (Statistics, Mathematics) Duck: can swim (but not like a fish) fly (but not like a bird) walk (in a funny way)

3 In present one, we introduce  Statistical foundations of neural computation = Artificial foundations of neural computation Artificial Neural Networks Biological foundations Artificial foundations (Neuroscience) (Statistics, Mathematics) Duck: can swim (but not like a fish) (Feng) fly (but not like a bird) (all my colleagues here) walk (in a funny way)

4 In present one, we introduce  Statistical foundations of neural computation = Artificial foundations of neural computation Artificial Neural Networks Biological foundations Artificial foundations (Neuroscience) (Statistics, Mathematics) Duck: can swim (but not like a fish) (Feng) fly (but not like a bird) walk (in a funny way)

5 Topic Pattern recognition Cluster Statistical Approach

6 Statistical Learning (training from data set, adaptation) change weights or interaction between neurons according to examples, previous knowledge The purpose of learning is to minimize  training errors on learning data 

7 Learning (training from data set, adaptation) and The purpose of learning is that to minimize  training errors on learning data: learning error  prediction errors on new, unseen data: generalization error

8 Learning (training from data set, adaptation) and The purpose of learning is that to minimize  training errors  prediction errors The neuroscience basis of learning remains elusive, although we have seen some progresses (see references in the previous lecture)

9 LEARNING: extracting principles from data set. Supervised learning: have a teacher, telling you where to go Unsupervised learning: not teacher, learn by itself Reinforcement learning: have a critics, wrong or correct Statistical learning: the artificial, reasonable way of training and prediction

10 LEARNING: extracting principles from data set. Supervised learning: have a teacher, telling you where to go Unsupervised learning: not teacher, learn by itself Reinforcement learning: have a critics, wrong or correct We will concentrate on the first two. You could find reinforced learning from Haykin, Hertz et al. books or Sutton R.S., and Barto A.G. (1998) Reinforcement learning: an introduction Cambridge, MA: MIT Press Statistical learning: the artificial, reasonable way of training and prediction

11 Pattern recognition (classifications), a special case of learning The simplest case: f (x) =1 or -1 for x in X (the set of objects we intend to separate) Example : X, a bunch of faces x, a single face,

12 Pattern recognition (classifications), a special case of learning The simplest case: f (x) =1 or -1 for x in X (the set of objects we intend to separate) For example: X, a bunch of faces x, a single face, f(  f( 

13 Pattern: as opposite of a chaos; it is an entity, vaguely defined, that could be given a name Examples: a fingerprint image, a handwritten word, a human face, a speech signal, an iris pattern etc.

14 Pattern: Given a pattern: a. supervised classification (discriminant analysis) in which the input pattern is identified as a member of a predefined class b. unsupervised classification (e.g.. clustering ) in which the patter is assigned to a hitherto unknown class. Unsupervised classification will be introduced in later Lectures

15 Pattern recognition is the process of assigning patterns to one of a number of classes x y feature extraction pattern space (data) feature space

16 feature extraction Hair length y =0 Hair length y = 30 cm x =

17 Pattern recognition is the process of assigning patterns to one of a number of classes x y feature extraction classification Decision space pattern space (data) feature space

18 feature extraction Hair length =0 Hair length = 30 cm classification Short hair = male Long hair = female

19 Feature extraction: which is a very fundamental issue For example: when we recognize a face, which feature we use ???? Eye pattern, geometric outline etc.

20 Two approaches: Statistical approach Clusters: template matching In two steps: Find a discrimant function in terms of certain features Make a decision in terms of the discrimant function discriminant function: a function used to decide on class membership

21 Cluster: patterns of a class should be grouped or clustered together in pattern or feature space if decision space is to be partitioned objects near together must be similar objects far apart must be dissimilar distance measures: choice becomes important for basis of classification Once a distance is given, the pattern recognition is accomplished.

22 Hair Length

23 Distance metrics: different distance will be employed later To be a valid distance metric of the distance between two objects in and abstract space W, a distance metric must satisfy following conditions

24 Distance metrics: different distance will be employed later To be a valid distance metric of the distance between two objects in and abstract space W, a distance metric must satisfy following conditions d(x,y)>=0 nonnegative d(x,x)=0 reflexivity d(x,y)=d(y,x) symmetrical d(x,y)<= d(x,z)+d(z,y) triangle inequality We will encounter different distances, for example distance metric -- relative entropy (distance from information theory

25 Hamming distance For x = {x i } and y = {y i } d H (x, y ) =  |x i -y i | measure of sum of absolute different between each element of two vectors x and y most often used in comparing binary vectors (binary pixel figures, black and white figures) e.g. d H ([1 0 0 1 1 1 0 1], [1 1 0 1 0 01 1]) = 4 = ( 1 1 1 1 1 1 1 1 0)

26 Euclidean Distance For x = {x i } and y = {y i } d (x, y ) = [  (x i -y i ) 2 ] 1/2 Most widely used distance, easy to calculate Minkowski Distance For x = {x i } and y = {y i } d (x, y ) = [  x i -y i | r ] 1/r r > 0

27 Statistical approach : Hair length

28 Distribution density p 1 (x) and p 2 (x) If p 1 (x) > p 2 (x) then x is in class one other wise it is in class two The discriminant function is given by p 1 (x) = p 2 (x) Now the problem of statistical pattern recognition is reduced to estimate the probability density for a given data {x} and {y} In general there are two approaches Parametric method Nonparametric method

29 Parametric methods Assumes knowledge of underlying probability density distribution p(x) Advantages: need only adjust parameters distributions to obtain best fit. According to the central limit theorem, we could assume in many cases that the distribution is Gaussian (see below) Disadvantage: if assumption is wrong than poor performance in terms of misclassification. However, if crude classification acceptable then this can be OK

30 Normal (Gaussian) Probability Distribution --common assumption that density distribution is normal For single variable X mean E X =  variance E ( X- E X) 2 =  2

31 For multiple dimensions x x feature vector,  mean vector, covariance matrix  an nxn matrix and is symmetric and  ij = E [ (X i -  i ) (X j -  j ) ] the correlation between X i and X j |  | = determinant of    = inverse of 

32 Fig. here

33 Mahalanobis distance u1u1 u2u2  

34 Topic Hebbian learning rule

35 Hebbian learning rule is local: only involving two neurones, independent of other variables We will return to Hebbian learning rule later in the course in PCA learning There are other possible ways of learning which are demonstrated in experiments (see Nature Neuroscience, as in previous lecture)

36 Biological learning Vs. statistical learning Biological learning: Hebbian learning rule When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic changes take place in one of both cells such that A’s efficiency as one of the cell firing B, is increased A B Cooperation between two neurons In mathematical term: w(t) as the weight between two neurons a t time t w(t+1)=w(t)+  r A r B

37


Download ppt "3.Learning In previous lecture, we discussed the biological foundations of of neural computation including  single neuron models  connecting single neuron."

Similar presentations


Ads by Google