Supervised Learning: Linear Perceptron NN. Distinction Between Approximation- Based vs. Decision-Based NNs Teacher in Approximation-Based NN are quantitative.

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Introduction to Support Vector Machines (SVM)
Neural networks Introduction Fitting neural networks
Introduction to Neural Networks Computing
Artificial Neural Networks (1)
Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
Perceptron Learning Rule
Support Vector Machines
An Overview of Machine Learning
Supervised Learning Recap
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Neural NetworksNN 11 Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Connectionist models. Connectionist Models Motivated by Brain rather than Mind –A large number of very simple processing elements –A large number of weighted.
Announcements  Homework 4 is due on this Thursday (02/27/2004)  Project proposal is due on 03/02.
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
20.5 Nerual Networks Thanks: Professors Frank Hoffmann and Jiawei Han, and Russell and Norvig.
Supervised learning: Mixture Of Experts (MOE) Network.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Perceptron Learning Rule
An Illustrative Example
Data Mining with Neural Networks (HK: Chapter 7.5)
Supervised Learning Networks. Linear perceptron networks Multi-layer perceptrons Mixture of experts Decision-based neural networks Hierarchical neural.
Linear Discriminant Functions Chapter 5 (Duda et al.)
CS Instance Based Learning1 Instance Based Learning.
Aula 4 Radial Basis Function Networks
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Radial-Basis Function Networks
Radial Basis Function Networks
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
Artificial Neural Networks Shreekanth Mandayam Robi Polikar …… …... … net k   
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Classification Heejune Ahn SeoulTech Last updated May. 03.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 24 Nov 2, 2005 Nanjing University of Science & Technology.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Intro. ANN & Fuzzy Systems Lecture 38 Mixture of Experts Neural Network.
1 Technological Educational Institute Of Crete Department Of Applied Informatics and Multimedia Intelligent Systems Laboratory.
Pattern Recognition. What is Pattern Recognition? Pattern recognition is a sub-topic of machine learning. PR is the science that concerns the description.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
Linear Models & Clustering Presented by Kwak, Nam-ju 1.
Supervised Learning – Network is presented with the input and the desired output. – Uses a set of inputs for which the desired outputs results / classes.
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Machine Learning Supervised Learning Classification and Regression
Neural networks and support vector machines
Big data classification using neural network
Ananya Das Christman CS311 Fall 2016
Overview of Supervised Learning
3. Applications to Speaker Verification
Neuro-Computing Lecture 4 Radial Basis Function Network
Perceptron Learning Rule
Perceptron Learning Rule
Perceptron Learning Rule
Presentation transcript:

Supervised Learning: Linear Perceptron NN

Distinction Between Approximation- Based vs. Decision-Based NNs Teacher in Approximation-Based NN are quantitative in real or complex values Teacher in Decision-Based NNs are symbols, instead of numeric complex values.

Decision-Based NN (DBNN) Linear Perceptron Discriminant function (Score function) Reinforced and Anti-reinforced Learning Rules Hierarchical and Modular Structures

incorrect /correct classes next pattern 1xw)1xw) 2xw)2xw) Mxw)Mxw)

Supervised Learning: Linear Perceptron NN

Upon the presentation of the m-th training pattern z (m), the weight vector w (m) is updated as Two-Classes: Linear Perceptron Learning Rule w (m+1) = w (m) +  (t (m) - d (m) ) z (m)  j  x  w j  ) =  x T w j +w 0 ) = z T ŵ j (= z T w)  ▽  j  z  w j  ) = z where  is a positive learning rate.

If a set of training patterns is linearly separable, then the linear perceptron learning algorithm converges to a correct solution in a finite number of iterations. Linear Perceptron: Convergence Theorem (Two Classes)

It converges when learning rate  is small enough. w (m+1) = w (m) +  (t (m) - d (m) ) z (m)

linearly separable Multiple Classes strongly linearly separable

If the given multiple-class training set is linearly separable, then the linear perceptron learning algorithm converges to a correct solution after a finite number of iterations. Linear Perceptron Convergence Theorem (Multiple Classes)

Multiple Classes:Linear Perceptron Learning Rule (linearly separability)

P 1j = [ z 0 0 … -z 0 … 0]

DBNN Structure for Nonlinear Discriminant Function x y 1xw)1xw) 2xw)2xw) 3xw)3xw) MAXNET

DBNN MAXNET w1w1 w2w2 w3w3 teacher Training if teacher indicates the need x y

Decision-based learning rule is based on a minimal updating principle. The rule tends to avoid or minimize unnecessary side- effects due to overtraining. One scenario is that the pattern is already correctly classified by the current network, then there will be no updating attributed to that pattern, and the learning process will proceed with the next training pattern. The second scenario is that the pattern is incorrectly classified to another winning class. In this case, parameters of two classes must be updated. The score of the winning class should be reduced, by the anti-reinforced learning rule, while the score of the correct (but not winning) class should be enhanced by the reinforced learning rule.

 w j  w j  j  x  w) Reinforced and Anti-reinforced Learning  w i  w i  i  x  w) Reinforced Learning Anti-Reinforced Learning Suppose that the m -th training patternn x (m), j = arg max i≠j φ( x (m), Θ j ) The leading challenger is denoted by x (m) is known to belong to the i-th class.

Anti-Reinforced Learning  w j  x  w j )  ▽  j  x  w j  ) =  x  w j )  w i  x  w i ) Reinforced Learning For Simple RBF Discriminant Function Upon the presentation of the m-th training pattern z (m), the weight vector w (m) is updated as   j  x  w j  ) =.5  x  w j ) 2

The learning scheme of the DBNN consists of two phases: locally unsupervised learning. globally supervised learning. Decision-Based Learning Rule

Several approaches can be used to estimate the number of hidden nodes or the initial clustering can be determined based on VQ or EM clustering methods. Locally Unsupervised Learning Via VQ or EM Clustering Method EM allows the final decision to incorporate prior information. This could be instrumental to multiple- expert or multiple-channel information fusion.

The objective of learning is minimum classification error (not maximum likelihood estimation). Inter-class mutual information is used to fine tune the decision boundaries (i.e., the globally supervised learning). In this phase, DBNN applies reinforced-antireinforced learning rule [Kung95], or discriminative learning rule [Juang92], to adjust network parameters. Only misclassified patterns need to be involved in this training phase. Globally Supervised Learning Rules

a a aa a a a a a a a a aa a b b b b b b b b b b b b c c c c c c c c c c c c c c c c c b b a a aa a a a aaa a b b b b b b b b c c c c c c  b b b b Pictorial Presentation of Hierarchical DBNN

Discriminant function (Score function) LBF Function (or Mixture of) RBF Function (or Mixture of) Prediction Error Function Likelihood Function : HMM

Hierarchical and Modular DBNN Subcluster DBNN Probabilistic DBNN Local Experts via K-mean or EM Reinforced and Anti-reinforced Learning

MAXNET Subcluster DBNN

Subcluster Decision-Based Learning Rule

Probabilistic DBNN Probabilistic DBNN

MAXNET Probabilistic DBNN

MAXNET Probabilistic DBNN

Subnetwork of a Probabilistic DBNN is basically a mixture of local experts RBF P(y|x,    P(y|x,    P(y|x,    P(y|x,  k  k-th subnetwork x

Probabilistic Decision-Based Neural Networks

Training of Probabilistic DBNN Selection of initial local experts: Intra-class training Unsupervised training EM (Probabilistic) Training Training of the experts: Inter-class training Supervised training Reinforced and Anti-reinforced Learning

Locally Unsupervised Phase Globally supervised Phase K-means Feature Vectors K-NNs EM Classification Class ID Reinforced Learning Reinforced Learning Converge ? Y Misclassified vectors N Probabilistic Decision-Based Neural Networks Training procedure

Probabilistic Decision-Based Neural Networks GMMPDBNN 2-D Vowel Problem:

For MOE, the influence from the training patterns on each expert is regulated by the gating network (which itself is under training) so that as the training goes, the training patterns will have higher influence on the closer-by experts, and lower influence on the far-away ones. (The MOE updates all the classes.) Unlike the MOE, the DBNN makes use of both unsupervised (EM-type) and supervised (decision-based) learning rules. The DBNN uses only mis-classified training patterns for its globally supervised learning. The DBNN updates only the ``winner" class and the class which the mis-classified pattern actually belongs to. Its training strategy is to abide by a ``minimal updating principle“. Difference of MOE and DBNN

DBNN/PDBNN Applications OCR (DBNN) Texture Segmentation(DBNN) Mammogram Diagnosis (PDBNN) Face Detection(PDBNN) Face Recognition (PDBNN) Money Recognition(PDBNN) Multimedia Library(DBNN)

OCR Classification (DBNN)

Image Texture Classification (DBNN)

Face Detection (PDBNN)

Face Recognition (PDBNN)

show movies

Multimedia Library(PDBNN)

MatLab Assignment #4: DBNN to separate 2 classes RBF DBNN with 4 centroids per class RBF DBNN with 4 centroids and 6 centroids for green and blue classes respectively. ratio=2:1

RBF-BP NN for Dynamic Resource Allocation use content to determine renegotiation time use content/ST-traffic to estimate how much resource to request Neural network traffic predictor yields smaller prediction MSE and higher link utilization.

Modern information technology in the internet era should support interactive and intelligent processing that transforms and transfers information. Intelligent Media Agent Integration of signal processing and neural net techniques could be a versatile tool to a broad spectrum of multimedia applications.

EM Applications Uncertain Clustering/ Model Channel Confidence * * * Expert 1Expert 2 Channel 1 Channel 2

Channel Fusion

classes-in-channel network channel Sensor = Channel = Expert

Sensor Fusion Human Sensory Modalities Computer Sensory Modalities Da. “Ga” “Ba”

Fusion Example Toy Car Recognition

Probabilistic Decision-Based Neural Networks

Locally Unsupervised Phase Globally supervised Phase K-means Feature Vectors K-NNs EM Classification Class ID Reinforced Learning Reinforced Learning Converge ? Y Misclassified vectors N Probabilistic Decision-Based Neural Networks Training procedure

Probabilistic Decision-Based Neural Networks GMMPDBNN 2-D Vowel Problem:

References: [1]Lin, S.H., Kung, S.Y. and Lin, L.J. (1997). “Face recognition/detection by probabilistic decision-based neural network, IEEE Trans. on Neural Networks, 8 (1), pp [2]Mak, M.W. et al. (1994), “Speaker Identification using Multi Layer Perceptrons and Radial Basis Functions Networks,” Neurocomputing, 6 (1),