11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems.

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Support Vector Machine
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
ECG Signal processing (2)
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Pattern Recognition and Machine Learning
An Introduction of Support Vector Machine
Support Vector Machines
SVM—Support Vector Machines
Machine learning continued Image source:
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Supervised Learning Recap
Face Recognition & Biometric Systems Support Vector Machines (part 2)
Chapter 4: Linear Models for Classification
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Support Vector Machines (and Kernel Methods in general)
Support Vector Machines and Kernel Methods
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
6/10/ Visual Recognition1 Radial Basis Function Networks Computer Science, KAIST.
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
CES 514 – Data Mining Lecture 8 classification (contd…)
Reduced Support Vector Machine
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
2806 Neural Computation Support Vector Machines Lecture Ari Visa.
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Supervised Learning Networks. Linear perceptron networks Multi-layer perceptrons Mixture of experts Decision-based neural networks Hierarchical neural.
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks I PROF. DR. YUSUF OYSAL.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Statistical Learning Theory: Classification Using Support Vector Machines John DiMona Some slides based on Prof Andrew Moore at CMU:
Aula 4 Radial Basis Function Networks
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Radial-Basis Function Networks
Radial Basis Function Networks
8/10/ RBF NetworksM.W. Mak Radial Basis Function Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison.
An Introduction to Support Vector Machines Martin Law.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
This week: overview on pattern recognition (related to machine learning)
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
EM and expected complete log-likelihood Mixture of Experts
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
An Introduction to Support Vector Machines (M. Law)
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Biointelligence Laboratory, Seoul National University
Linear Models for Classification
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines Tao Department of computer science University of Illinois.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
A Short and Simple Introduction to Linear Discriminants (with almost no math) Jennifer Listgarten, November 2002.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Intro. ANN & Fuzzy Systems Lecture 13. MLP (V): Speed Up Learning.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Intro. ANN & Fuzzy Systems Lecture 38 Mixture of Experts Neural Network.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Big data classification using neural network
CS 9633 Machine Learning Support Vector Machines
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Neuro-Computing Lecture 4 Radial Basis Function Network
Introduction to Radial Basis Function Networks
CSCE833 Machine Learning Lecture 9 Linear Discriminant Analysis
Linear Discrimination
Presentation transcript:

NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

NIMIA Crema, Italy2 Modular networks Why modular approach Motivations  Biological  Learning  Computational  Implementation

NIMIA Crema, Italy3 Motivations Biological  Biological systems are not homogenous  Functional specialization  Fault tolerance  Cooperation, competition  Scalability  Extendibility

NIMIA Crema, Italy4 Motivations Complexity of learning (divide and conquer)  Training of complex network (many layers) layer by layer learning  Speed of learning  Catastrophic interference, incremental learning  Mixing supervised and unsupervised learning  Hierarchical knowledge structure

NIMIA Crema, Italy5 Motivations Computational  The capacity of a network  The size of the network  Catastrophic interference  Generalization capability vs network complexity

NIMIA Crema, Italy6 Motivations Implementation (hardware)  The degree of parallelism  Number of connections  The length of physical connections  Fan out

NIMIA Crema, Italy7 Modular networks What modules The modules are disagree on some inputs  every module solves the same, whole problem, different ways of solutions (different modules)  every module solves different tasks (sub-tasks) task decomposition (input space, output space)

NIMIA Crema, Italy8 Modular networks How combine modules Cooperative modules  simple average  weighted average (fixed weights) optimal linear combination (OLC) of networks Competitive modules  majority vote  winner takes all Competitive/cooperative modules  weighted average (input-dependent weights) mixture of experts (MOE)

NIMIA Crema, Italy9 Modular networks Construct of modular networks  Task decomposition, subtask definition  Training modules for solving subtasks  Integration of the results (cooperation and/or competition)

NIMIA Crema, Italy10 Modular networks Cooperative networks  Ensemble (average)  Optimal linear combination of networks  Disjoint subtasks Competitive networks  Ensemble (vote) Competitive/cooperative networks  Mixture of experts

NIMIA Crema, Italy11 Cooperative networks Ensemble of cooperating networks (classification/regression) The motivation  Heuristic explanation Different experts together can solve a problem better Complementary knowledge  Mathematical justification Accurate and diverse modules

NIMIA Crema, Italy12 Ensemble of networks Mathematical justification  Ensemble output  Ambiguity (diversity)  Individual error  Ensemble error  Constraint

NIMIA Crema, Italy13 Ensemble of networks Mathematical justification (cont’d)  Weighted error  Weighted diversity  Ensemble error  Averaging over the input distribution Solution: Ensemble of accurate and diverse networks

NIMIA Crema, Italy14 Ensemble of networks How to get accurate and diverse networks  different structures: more than one network structure (e.g. MLP, RBF, CCN, etc.)  different size, different complexity networks (number of hidden units, number of layers, nonlinear function, etc.)  different learning strategies (BP, CG, random search,etc.) batch learning, sequential learning  different training algorithms, sample order, learning samples  different training parameters  different starting parameter values  different stopping criteria

NIMIA Crema, Italy15 Linear combination of networks NN M NN 1 NN 2 α1α1 α2α2 αMαM Σ y1y1 y2y2 yMyM x NN M α 0 y 0 =1

NIMIA Crema, Italy16 Linear combination of networks Computation of optimal coefficients   simple average , k depends on the input for different input domains different network (alone gives the output)  optimal values using the constraint  optimal values without any constraint Wiener-Hopf equation

NIMIA Crema, Italy17 Task decomposition Decomposition related to learning  before learning (subtask definition)  during learning (automatic task decomposition) Problem space decomposition  input space (input space clustering, definition of different input regions)  output space (desired response)

NIMIA Crema, Italy18 Task decomposition Decomposition into separate subproblems  K-class classification K two-class problems (coarse decomposition)  Complex two-class problems smaller two-class problems (fine decomposition)  Integration (module combination)

NIMIA Crema, Italy19 Task decomposition A 3-class problem

NIMIA Crema, Italy20 Task decomposition 3 classes 2 small classes

NIMIA Crema, Italy21 Task decomposition 3 classes 2 classes 2 small classes

NIMIA Crema, Italy22 Task decomposition 3 classes 2 small classes

NIMIA Crema, Italy23 Task decomposition M 12 M 13 M 23 MIN C1C1 C2C2 C3C3 INV= Input

NIMIA Crema, Italy24 Task decomposition A two-class problem decomposed into subtasks

NIMIA Crema, Italy25 Task decomposition AND OR AND M 11 M 12 M 22 M 21

NIMIA Crema, Italy26 Task decomposition M 11 M 21 MIN MAX MIN C1C1 Input M 12 M 22

NIMIA Crema, Italy27 Task decomposition Training set decomposition:  Original training set  Training set for each of the (K) two-class problems  Each of the two-class problems are divided into K-1 smaller two- class problems [using an inverter module really (K-1)/2 is enough]

NIMIA Crema, Italy28 Task decomposition input number 16 x 16 Normalization Edge detection horizontal diagonal \ diagonal / vertical Kirsch masks 4 16 x 16 feature maps 4 8 x 8 matrix input number 16 x 16 A practical example: Zip code recognition

NIMIA Crema, Italy29 Task decomposition Zip code recognition (handwritten character recognition) modular solution 45 (K*K-1)/2 neurons 10 AND gates (MIN operator) inputs

NIMIA Crema, Italy30 Mixture of Experts (MOE) Expert 2Expert 1 Gating network μ 1 μ g1g1 g2g2 x Expert M gMgM Σ 

NIMIA Crema, Italy31 Mixture of Experts (MOE) The output is the weighted sum of the outputs of the experts is the parameter of the i-th expert The output of the gating network: “softmax” function is the parameter of the gating network

NIMIA Crema, Italy32 Mixture of Experts (MOE)  Probabilistic interpretation  The probabilistic model with true parameters  a priori probability

NIMIA Crema, Italy33 Mixture of Experts (MOE) Training  Training data  Probability of generating output from the input  The log likelihood function (maximum likelihood estimation)

NIMIA Crema, Italy34 Mixture of Experts (MOE) Training (cont’d)  Gradient method  The parameter of the expert network  The parameter of the gating network and

NIMIA Crema, Italy35 Mixture of Experts (MOE) Training (cont’d)  A priori probability  A posteriori probability

NIMIA Crema, Italy36 Mixture of Experts (MOE) Training (cont’d)  EM (Expectation Maximization) algorithm A general iterative technique for maximum likelihood estimation Introducing hidden variables Defining a log likelihood function  Two steps: Expectation of the hidden variables Maximization of the log likelihood function

NIMIA Crema, Italy37 EM (Expectation Maximization) algorithm A simple example: estimating means of k (2) Gaussians f (y│µ 1 ) f (y│  2 ) Measurements

NIMIA Crema, Italy38 EM (Expectation Maximization) algorithm A simple example: estimating means of k (2) Gaussians  hidden variables for every observation, (x(l), z i1, z i2 )  likelihood function  Log likelihood function  expected value of with given

NIMIA Crema, Italy39 Mixture of Experts (MOE) A simple example: estimating means of k (2) Gaussians  Expected log likelihood function where  The estimate of the means

NIMIA Crema, Italy40 Mixture of Experts (MOE) Applications  Simple experts: linear experts  ECG diagnostics  Mixture of Kalman filters Discussion: comparison to non-modular architecture

NIMIA Crema, Italy41 Support vector machines A new approach: Gives answers for questions not solved using the classical approach  The size of the network  The generalization capability

NIMIA Crema, Italy42 Classical neural learningSupport Vector Machine Support vector machines Optimal hyperplane Classification

NIMIA Crema, Italy43 VC dimension

NIMIA Crema, Italy44 Structural error minimization

NIMIA Crema, Italy45 Support vector machines Linearly separable two-class problem  separating hyperpalne Optimal hyperplane

NIMIA Crema, Italy46 Support vector machines x d(x)d(x) x1x1 x2x2 Geometric interpretation

NIMIA Crema, Italy47 Support vector machines Criterion function, Lagrange function a constrained optimization problem conditions dual problem support vectorsoptimal hyperplane

NIMIA Crema, Italy48 Support vector machines Linearly nonseparable case separating hyperplane criterion function Lagrange function support vectors optimal hyperplane Optimal hyperplane

NIMIA Crema, Italy49 Support vector machines Nonlinear separation  separating hyperplane  decision surface  kernel function  criterion function

NIMIA Crema, Italy50 Support vector machines Examples of SVM  Polynomial  RBF  MLP

NIMIA Crema, Italy51 Support vector machines Example: polynomial  basis functions  kernel function

NIMIA Crema, Italy52 Minimize: Constraint: Separable samples:Not separable samples: Constraint: Minimize: Where by minimizingwe maximize the distance of the classes, whilst we also control the VC dimension. SVR (classification)

NIMIA Crema, Italy53 SVR (regression)  C()C() 

NIMIA Crema, Italy54 SVR (regression) Constraints:Minimize:

NIMIA Crema, Italy55 SVR (regression) Lagrange function dual problem constraints support vectors solution

NIMIA Crema, Italy56 SVR (regression)

NIMIA Crema, Italy57 SVR (regression)

NIMIA Crema, Italy58 SVR (regression)

NIMIA Crema, Italy59 SVR (regression)

NIMIA Crema, Italy60 Support vector machines Main advantages  generalization  size of the network  centre parameters for RBF  linear-in-the-parameter structure  noise immunity

NIMIA Crema, Italy61 Support vector machines Main disadavantages  computation intensive (quadratic optimization)  hyperparameter selection VC dimension (classification)  batch processing

NIMIA Crema, Italy62 Support vector machines Variants  LS SVM  basic criterion function  Advantages: easier to compute  adaptivity,

NIMIA Crema, Italy63 Mixture of SVMs Problem of hyper-parameter selection for SVMs  Different SVMs, with different hyper-parameters  Soft separation of the input space

NIMIA Crema, Italy64 Mixture of SVMs

NIMIA Crema, Italy65 Boosting techniques Boosting by filtering Boosting by subsampling Boosting by reweighting

NIMIA Crema, Italy66 Boosting techniques Boosting by filtering

NIMIA Crema, Italy67 Boosting techniques Boosting by subsampling

NIMIA Crema, Italy68 Boosting techniques Boosting by reweighting

NIMIA Crema, Italy69 Other modular architectures

NIMIA Crema, Italy70 Other modular architectures

NIMIA Crema, Italy71 Other modular architectures Modular classifiers  Decoupled modules  Hierarchical modules  Network ensemble (linear combination)  Network ensemble (decision, voting)

NIMIA Crema, Italy72 Modular architectures