3.Learning In previous lecture, we discussed the biological foundations of of neural computation including  single neuron models  connecting single neuron.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Principles of Density Estimation
Lecture 2: Basics and definitions Networks as Data Models.
Introduction to Artificial Neural Networks
Artificial Neural Networks (1)
Perceptron Learning Rule
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Data Mining Classification: Alternative Techniques
Outline Data with gaps clustering on the basis of neuro-fuzzy Kohonen network Adaptive algorithm for probabilistic fuzzy clustering Adaptive probabilistic.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.
Chapter 2: Pattern Recognition
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Un Supervised Learning & Self Organizing Maps Learning From Examples
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Introduction to Neural Networks Simon Durrant Quantitative Methods December 15th.
Clustering.
September 16, 2010Neural Networks Lecture 4: Models of Neurons and Neural Networks 1 Capabilities of Threshold Neurons By choosing appropriate weights.
Lecture #1COMP 527 Pattern Recognition1 Pattern Recognition Why? To provide machines with perception & cognition capabilities so that they could interact.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
COMP305. Part I. Artificial neural networks.. Topic 3. Learning Rules of the Artificial Neural Networks.
Introduction to machine learning
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.
Unsupervised learning
嵌入式視覺 Pattern Recognition for Embedded Vision Template matching Statistical / Structural Pattern Recognition Neural networks.
1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute.
Principles of Pattern Recognition
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 13 Oct 14, 2005 Nanjing University of Science & Technology.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Image Classification 영상분류
7 1 Supervised Hebbian Learning. 7 2 Hebb’s Postulate “When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
1 E. Fatemizadeh Statistical Pattern Recognition.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI
Digital Image Processing Lecture 25: Object Recognition Prof. Charlene Tsai.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Introduction to Pattern Recognition (การรู้จํารูปแบบเบื้องต้น)
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Fuzzy Pattern Recognition. Overview of Pattern Recognition Pattern Recognition Procedure Feature Extraction Feature Reduction Classification (supervised)
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Machine Learning for Computer Security
Self-Organizing Network Model (SOM) Session 11
Data Mining, Neural Network and Genetic Programming
Radial Basis Function G.Anuradha.
Statistical Models for Automatic Speech Recognition
An Introduction to Supervised Learning
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Machine Learning – a Probabilistic Perspective
Random Neural Network Texture Model
CS249: Neural Language Model
Presentation transcript:

3.Learning In previous lecture, we discussed the biological foundations of of neural computation including  single neuron models  connecting single neuron behaviour with network models  spiking neural networks  computational neuroscience

In present one, we introduce  Statistical foundations of neural computation = Artificial foundations of neural computation Artificial Neural Networks Biological foundations Artificial foundations (Neuroscience) (Statistics, Mathematics) Duck: can swim (but not like a fish) fly (but not like a bird) walk (in a funny way)

In present one, we introduce  Statistical foundations of neural computation = Artificial foundations of neural computation Artificial Neural Networks Biological foundations Artificial foundations (Neuroscience) (Statistics, Mathematics) Duck: can swim (but not like a fish) (Feng) fly (but not like a bird) (all my colleagues here) walk (in a funny way)

In present one, we introduce  Statistical foundations of neural computation = Artificial foundations of neural computation Artificial Neural Networks Biological foundations Artificial foundations (Neuroscience) (Statistics, Mathematics) Duck: can swim (but not like a fish) (Feng) fly (but not like a bird) walk (in a funny way)

Topic Pattern recognition Cluster Statistical Approach

Statistical Learning (training from data set, adaptation) change weights or interaction between neurons according to examples, previous knowledge The purpose of learning is to minimize  training errors on learning data 

Learning (training from data set, adaptation) and The purpose of learning is that to minimize  training errors on learning data: learning error  prediction errors on new, unseen data: generalization error

Learning (training from data set, adaptation) and The purpose of learning is that to minimize  training errors  prediction errors The neuroscience basis of learning remains elusive, although we have seen some progresses (see references in the previous lecture)

LEARNING: extracting principles from data set. Supervised learning: have a teacher, telling you where to go Unsupervised learning: not teacher, learn by itself Reinforcement learning: have a critics, wrong or correct Statistical learning: the artificial, reasonable way of training and prediction

LEARNING: extracting principles from data set. Supervised learning: have a teacher, telling you where to go Unsupervised learning: not teacher, learn by itself Reinforcement learning: have a critics, wrong or correct We will concentrate on the first two. You could find reinforced learning from Haykin, Hertz et al. books or Sutton R.S., and Barto A.G. (1998) Reinforcement learning: an introduction Cambridge, MA: MIT Press Statistical learning: the artificial, reasonable way of training and prediction

Pattern recognition (classifications), a special case of learning The simplest case: f (x) =1 or -1 for x in X (the set of objects we intend to separate) Example : X, a bunch of faces x, a single face,

Pattern recognition (classifications), a special case of learning The simplest case: f (x) =1 or -1 for x in X (the set of objects we intend to separate) For example: X, a bunch of faces x, a single face, f(  f( 

Pattern: as opposite of a chaos; it is an entity, vaguely defined, that could be given a name Examples: a fingerprint image, a handwritten word, a human face, a speech signal, an iris pattern etc.

Pattern: Given a pattern: a. supervised classification (discriminant analysis) in which the input pattern is identified as a member of a predefined class b. unsupervised classification (e.g.. clustering ) in which the patter is assigned to a hitherto unknown class. Unsupervised classification will be introduced in later Lectures

Pattern recognition is the process of assigning patterns to one of a number of classes x y feature extraction pattern space (data) feature space

feature extraction Hair length y =0 Hair length y = 30 cm x =

Pattern recognition is the process of assigning patterns to one of a number of classes x y feature extraction classification Decision space pattern space (data) feature space

feature extraction Hair length =0 Hair length = 30 cm classification Short hair = male Long hair = female

Feature extraction: which is a very fundamental issue For example: when we recognize a face, which feature we use ???? Eye pattern, geometric outline etc.

Two approaches: Statistical approach Clusters: template matching In two steps: Find a discrimant function in terms of certain features Make a decision in terms of the discrimant function discriminant function: a function used to decide on class membership

Cluster: patterns of a class should be grouped or clustered together in pattern or feature space if decision space is to be partitioned objects near together must be similar objects far apart must be dissimilar distance measures: choice becomes important for basis of classification Once a distance is given, the pattern recognition is accomplished.

Hair Length

Distance metrics: different distance will be employed later To be a valid distance metric of the distance between two objects in and abstract space W, a distance metric must satisfy following conditions

Distance metrics: different distance will be employed later To be a valid distance metric of the distance between two objects in and abstract space W, a distance metric must satisfy following conditions d(x,y)>=0 nonnegative d(x,x)=0 reflexivity d(x,y)=d(y,x) symmetrical d(x,y)<= d(x,z)+d(z,y) triangle inequality We will encounter different distances, for example distance metric -- relative entropy (distance from information theory

Hamming distance For x = {x i } and y = {y i } d H (x, y ) =  |x i -y i | measure of sum of absolute different between each element of two vectors x and y most often used in comparing binary vectors (binary pixel figures, black and white figures) e.g. d H ([ ], [ ]) = 4 = ( )

Euclidean Distance For x = {x i } and y = {y i } d (x, y ) = [  (x i -y i ) 2 ] 1/2 Most widely used distance, easy to calculate Minkowski Distance For x = {x i } and y = {y i } d (x, y ) = [  x i -y i | r ] 1/r r > 0

Statistical approach : Hair length

Distribution density p 1 (x) and p 2 (x) If p 1 (x) > p 2 (x) then x is in class one other wise it is in class two The discriminant function is given by p 1 (x) = p 2 (x) Now the problem of statistical pattern recognition is reduced to estimate the probability density for a given data {x} and {y} In general there are two approaches Parametric method Nonparametric method

Parametric methods Assumes knowledge of underlying probability density distribution p(x) Advantages: need only adjust parameters distributions to obtain best fit. According to the central limit theorem, we could assume in many cases that the distribution is Gaussian (see below) Disadvantage: if assumption is wrong than poor performance in terms of misclassification. However, if crude classification acceptable then this can be OK

Normal (Gaussian) Probability Distribution --common assumption that density distribution is normal For single variable X mean E X =  variance E ( X- E X) 2 =  2

For multiple dimensions x x feature vector,  mean vector, covariance matrix  an nxn matrix and is symmetric and  ij = E [ (X i -  i ) (X j -  j ) ] the correlation between X i and X j |  | = determinant of    = inverse of 

Fig. here

Mahalanobis distance u1u1 u2u2  

Topic Hebbian learning rule

Hebbian learning rule is local: only involving two neurones, independent of other variables We will return to Hebbian learning rule later in the course in PCA learning There are other possible ways of learning which are demonstrated in experiments (see Nature Neuroscience, as in previous lecture)

Biological learning Vs. statistical learning Biological learning: Hebbian learning rule When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic changes take place in one of both cells such that A’s efficiency as one of the cell firing B, is increased A B Cooperation between two neurons In mathematical term: w(t) as the weight between two neurons a t time t w(t+1)=w(t)+  r A r B