Download presentation
Presentation is loading. Please wait.
Published byRoy Simmons Modified over 9 years ago
1
11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems
2
11. 10. 2001.NIMIA Crema, Italy2 Modular networks Why modular approach Motivations Biological Learning Computational Implementation
3
11. 10. 2001.NIMIA Crema, Italy3 Motivations Biological Biological systems are not homogenous Functional specialization Fault tolerance Cooperation, competition Scalability Extendibility
4
11. 10. 2001.NIMIA Crema, Italy4 Motivations Complexity of learning (divide and conquer) Training of complex network (many layers) layer by layer learning Speed of learning Catastrophic interference, incremental learning Mixing supervised and unsupervised learning Hierarchical knowledge structure
5
11. 10. 2001.NIMIA Crema, Italy5 Motivations Computational The capacity of a network The size of the network Catastrophic interference Generalization capability vs network complexity
6
11. 10. 2001.NIMIA Crema, Italy6 Motivations Implementation (hardware) The degree of parallelism Number of connections The length of physical connections Fan out
7
11. 10. 2001.NIMIA Crema, Italy7 Modular networks What modules The modules are disagree on some inputs every module solves the same, whole problem, different ways of solutions (different modules) every module solves different tasks (sub-tasks) task decomposition (input space, output space)
8
11. 10. 2001.NIMIA Crema, Italy8 Modular networks How combine modules Cooperative modules simple average weighted average (fixed weights) optimal linear combination (OLC) of networks Competitive modules majority vote winner takes all Competitive/cooperative modules weighted average (input-dependent weights) mixture of experts (MOE)
9
11. 10. 2001.NIMIA Crema, Italy9 Modular networks Construct of modular networks Task decomposition, subtask definition Training modules for solving subtasks Integration of the results (cooperation and/or competition)
10
11. 10. 2001.NIMIA Crema, Italy10 Modular networks Cooperative networks Ensemble (average) Optimal linear combination of networks Disjoint subtasks Competitive networks Ensemble (vote) Competitive/cooperative networks Mixture of experts
11
11. 10. 2001.NIMIA Crema, Italy11 Cooperative networks Ensemble of cooperating networks (classification/regression) The motivation Heuristic explanation Different experts together can solve a problem better Complementary knowledge Mathematical justification Accurate and diverse modules
12
11. 10. 2001.NIMIA Crema, Italy12 Ensemble of networks Mathematical justification Ensemble output Ambiguity (diversity) Individual error Ensemble error Constraint
13
11. 10. 2001.NIMIA Crema, Italy13 Ensemble of networks Mathematical justification (cont’d) Weighted error Weighted diversity Ensemble error Averaging over the input distribution Solution: Ensemble of accurate and diverse networks
14
11. 10. 2001.NIMIA Crema, Italy14 Ensemble of networks How to get accurate and diverse networks different structures: more than one network structure (e.g. MLP, RBF, CCN, etc.) different size, different complexity networks (number of hidden units, number of layers, nonlinear function, etc.) different learning strategies (BP, CG, random search,etc.) batch learning, sequential learning different training algorithms, sample order, learning samples different training parameters different starting parameter values different stopping criteria
15
11. 10. 2001.NIMIA Crema, Italy15 Linear combination of networks NN M NN 1 NN 2 α1α1 α2α2 αMαM Σ y1y1 y2y2 yMyM x NN M α 0 y 0 =1
16
11. 10. 2001.NIMIA Crema, Italy16 Linear combination of networks Computation of optimal coefficients simple average , k depends on the input for different input domains different network (alone gives the output) optimal values using the constraint optimal values without any constraint Wiener-Hopf equation
17
11. 10. 2001.NIMIA Crema, Italy17 Task decomposition Decomposition related to learning before learning (subtask definition) during learning (automatic task decomposition) Problem space decomposition input space (input space clustering, definition of different input regions) output space (desired response)
18
11. 10. 2001.NIMIA Crema, Italy18 Task decomposition Decomposition into separate subproblems K-class classification K two-class problems (coarse decomposition) Complex two-class problems smaller two-class problems (fine decomposition) Integration (module combination)
19
11. 10. 2001.NIMIA Crema, Italy19 Task decomposition A 3-class problem
20
11. 10. 2001.NIMIA Crema, Italy20 Task decomposition 3 classes 2 small classes
21
11. 10. 2001.NIMIA Crema, Italy21 Task decomposition 3 classes 2 classes 2 small classes
22
11. 10. 2001.NIMIA Crema, Italy22 Task decomposition 3 classes 2 small classes
23
11. 10. 2001.NIMIA Crema, Italy23 Task decomposition M 12 M 13 M 23 MIN C1C1 C2C2 C3C3 INV= Input
24
11. 10. 2001.NIMIA Crema, Italy24 Task decomposition A two-class problem decomposed into subtasks
25
11. 10. 2001.NIMIA Crema, Italy25 Task decomposition AND OR AND M 11 M 12 M 22 M 21
26
11. 10. 2001.NIMIA Crema, Italy26 Task decomposition M 11 M 21 MIN MAX MIN C1C1 Input M 12 M 22
27
11. 10. 2001.NIMIA Crema, Italy27 Task decomposition Training set decomposition: Original training set Training set for each of the (K) two-class problems Each of the two-class problems are divided into K-1 smaller two- class problems [using an inverter module really (K-1)/2 is enough]
28
11. 10. 2001.NIMIA Crema, Italy28 Task decomposition input number 16 x 16 Normalization Edge detection horizontal diagonal \ diagonal / vertical Kirsch masks 4 16 x 16 feature maps 4 8 x 8 matrix input number 16 x 16 A practical example: Zip code recognition
29
11. 10. 2001.NIMIA Crema, Italy29 Task decomposition Zip code recognition (handwritten character recognition) modular solution 45 (K*K-1)/2 neurons 10 AND gates (MIN operator) 256+1 inputs
30
11. 10. 2001.NIMIA Crema, Italy30 Mixture of Experts (MOE) Expert 2Expert 1 Gating network μ 1 μ g1g1 g2g2 x Expert M gMgM Σ
31
11. 10. 2001.NIMIA Crema, Italy31 Mixture of Experts (MOE) The output is the weighted sum of the outputs of the experts is the parameter of the i-th expert The output of the gating network: “softmax” function is the parameter of the gating network
32
11. 10. 2001.NIMIA Crema, Italy32 Mixture of Experts (MOE) Probabilistic interpretation The probabilistic model with true parameters a priori probability
33
11. 10. 2001.NIMIA Crema, Italy33 Mixture of Experts (MOE) Training Training data Probability of generating output from the input The log likelihood function (maximum likelihood estimation)
34
11. 10. 2001.NIMIA Crema, Italy34 Mixture of Experts (MOE) Training (cont’d) Gradient method The parameter of the expert network The parameter of the gating network and
35
11. 10. 2001.NIMIA Crema, Italy35 Mixture of Experts (MOE) Training (cont’d) A priori probability A posteriori probability
36
11. 10. 2001.NIMIA Crema, Italy36 Mixture of Experts (MOE) Training (cont’d) EM (Expectation Maximization) algorithm A general iterative technique for maximum likelihood estimation Introducing hidden variables Defining a log likelihood function Two steps: Expectation of the hidden variables Maximization of the log likelihood function
37
11. 10. 2001.NIMIA Crema, Italy37 EM (Expectation Maximization) algorithm A simple example: estimating means of k (2) Gaussians f (y│µ 1 ) f (y│ 2 ) Measurements
38
11. 10. 2001.NIMIA Crema, Italy38 EM (Expectation Maximization) algorithm A simple example: estimating means of k (2) Gaussians hidden variables for every observation, (x(l), z i1, z i2 ) likelihood function Log likelihood function expected value of with given
39
11. 10. 2001.NIMIA Crema, Italy39 Mixture of Experts (MOE) A simple example: estimating means of k (2) Gaussians Expected log likelihood function where The estimate of the means
40
11. 10. 2001.NIMIA Crema, Italy40 Mixture of Experts (MOE) Applications Simple experts: linear experts ECG diagnostics Mixture of Kalman filters Discussion: comparison to non-modular architecture
41
11. 10. 2001.NIMIA Crema, Italy41 Support vector machines A new approach: Gives answers for questions not solved using the classical approach The size of the network The generalization capability
42
11. 10. 2001.NIMIA Crema, Italy42 Classical neural learningSupport Vector Machine Support vector machines Optimal hyperplane Classification
43
11. 10. 2001.NIMIA Crema, Italy43 VC dimension
44
11. 10. 2001.NIMIA Crema, Italy44 Structural error minimization
45
11. 10. 2001.NIMIA Crema, Italy45 Support vector machines Linearly separable two-class problem separating hyperpalne Optimal hyperplane
46
11. 10. 2001.NIMIA Crema, Italy46 Support vector machines x d(x)d(x) x1x1 x2x2 Geometric interpretation
47
11. 10. 2001.NIMIA Crema, Italy47 Support vector machines Criterion function, Lagrange function a constrained optimization problem conditions dual problem support vectorsoptimal hyperplane
48
11. 10. 2001.NIMIA Crema, Italy48 Support vector machines Linearly nonseparable case separating hyperplane criterion function Lagrange function support vectors optimal hyperplane Optimal hyperplane
49
11. 10. 2001.NIMIA Crema, Italy49 Support vector machines Nonlinear separation separating hyperplane decision surface kernel function criterion function
50
11. 10. 2001.NIMIA Crema, Italy50 Support vector machines Examples of SVM Polynomial RBF MLP
51
11. 10. 2001.NIMIA Crema, Italy51 Support vector machines Example: polynomial basis functions kernel function
52
11. 10. 2001.NIMIA Crema, Italy52 Minimize: Constraint: Separable samples:Not separable samples: Constraint: Minimize: Where by minimizingwe maximize the distance of the classes, whilst we also control the VC dimension. SVR (classification)
53
11. 10. 2001.NIMIA Crema, Italy53 SVR (regression) C()C()
54
11. 10. 2001.NIMIA Crema, Italy54 SVR (regression) Constraints:Minimize:
55
11. 10. 2001.NIMIA Crema, Italy55 SVR (regression) Lagrange function dual problem constraints support vectors solution
56
11. 10. 2001.NIMIA Crema, Italy56 SVR (regression)
57
11. 10. 2001.NIMIA Crema, Italy57 SVR (regression)
58
11. 10. 2001.NIMIA Crema, Italy58 SVR (regression)
59
11. 10. 2001.NIMIA Crema, Italy59 SVR (regression)
60
11. 10. 2001.NIMIA Crema, Italy60 Support vector machines Main advantages generalization size of the network centre parameters for RBF linear-in-the-parameter structure noise immunity
61
11. 10. 2001.NIMIA Crema, Italy61 Support vector machines Main disadavantages computation intensive (quadratic optimization) hyperparameter selection VC dimension (classification) batch processing
62
11. 10. 2001.NIMIA Crema, Italy62 Support vector machines Variants LS SVM basic criterion function Advantages: easier to compute adaptivity,
63
11. 10. 2001.NIMIA Crema, Italy63 Mixture of SVMs Problem of hyper-parameter selection for SVMs Different SVMs, with different hyper-parameters Soft separation of the input space
64
11. 10. 2001.NIMIA Crema, Italy64 Mixture of SVMs
65
11. 10. 2001.NIMIA Crema, Italy65 Boosting techniques Boosting by filtering Boosting by subsampling Boosting by reweighting
66
11. 10. 2001.NIMIA Crema, Italy66 Boosting techniques Boosting by filtering
67
11. 10. 2001.NIMIA Crema, Italy67 Boosting techniques Boosting by subsampling
68
11. 10. 2001.NIMIA Crema, Italy68 Boosting techniques Boosting by reweighting
69
11. 10. 2001.NIMIA Crema, Italy69 Other modular architectures
70
11. 10. 2001.NIMIA Crema, Italy70 Other modular architectures
71
11. 10. 2001.NIMIA Crema, Italy71 Other modular architectures Modular classifiers Decoupled modules Hierarchical modules Network ensemble (linear combination) Network ensemble (decision, voting)
72
11. 10. 2001.NIMIA Crema, Italy72 Modular architectures
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.