Presentation is loading. Please wait.

Presentation is loading. Please wait.

11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems.

Similar presentations


Presentation on theme: "11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems."— Presentation transcript:

1 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

2 11. 10. 2001.NIMIA Crema, Italy2 Modular networks Why modular approach Motivations  Biological  Learning  Computational  Implementation

3 11. 10. 2001.NIMIA Crema, Italy3 Motivations Biological  Biological systems are not homogenous  Functional specialization  Fault tolerance  Cooperation, competition  Scalability  Extendibility

4 11. 10. 2001.NIMIA Crema, Italy4 Motivations Complexity of learning (divide and conquer)  Training of complex network (many layers) layer by layer learning  Speed of learning  Catastrophic interference, incremental learning  Mixing supervised and unsupervised learning  Hierarchical knowledge structure

5 11. 10. 2001.NIMIA Crema, Italy5 Motivations Computational  The capacity of a network  The size of the network  Catastrophic interference  Generalization capability vs network complexity

6 11. 10. 2001.NIMIA Crema, Italy6 Motivations Implementation (hardware)  The degree of parallelism  Number of connections  The length of physical connections  Fan out

7 11. 10. 2001.NIMIA Crema, Italy7 Modular networks What modules The modules are disagree on some inputs  every module solves the same, whole problem, different ways of solutions (different modules)  every module solves different tasks (sub-tasks) task decomposition (input space, output space)

8 11. 10. 2001.NIMIA Crema, Italy8 Modular networks How combine modules Cooperative modules  simple average  weighted average (fixed weights) optimal linear combination (OLC) of networks Competitive modules  majority vote  winner takes all Competitive/cooperative modules  weighted average (input-dependent weights) mixture of experts (MOE)

9 11. 10. 2001.NIMIA Crema, Italy9 Modular networks Construct of modular networks  Task decomposition, subtask definition  Training modules for solving subtasks  Integration of the results (cooperation and/or competition)

10 11. 10. 2001.NIMIA Crema, Italy10 Modular networks Cooperative networks  Ensemble (average)  Optimal linear combination of networks  Disjoint subtasks Competitive networks  Ensemble (vote) Competitive/cooperative networks  Mixture of experts

11 11. 10. 2001.NIMIA Crema, Italy11 Cooperative networks Ensemble of cooperating networks (classification/regression) The motivation  Heuristic explanation Different experts together can solve a problem better Complementary knowledge  Mathematical justification Accurate and diverse modules

12 11. 10. 2001.NIMIA Crema, Italy12 Ensemble of networks Mathematical justification  Ensemble output  Ambiguity (diversity)  Individual error  Ensemble error  Constraint

13 11. 10. 2001.NIMIA Crema, Italy13 Ensemble of networks Mathematical justification (cont’d)  Weighted error  Weighted diversity  Ensemble error  Averaging over the input distribution Solution: Ensemble of accurate and diverse networks

14 11. 10. 2001.NIMIA Crema, Italy14 Ensemble of networks How to get accurate and diverse networks  different structures: more than one network structure (e.g. MLP, RBF, CCN, etc.)  different size, different complexity networks (number of hidden units, number of layers, nonlinear function, etc.)  different learning strategies (BP, CG, random search,etc.) batch learning, sequential learning  different training algorithms, sample order, learning samples  different training parameters  different starting parameter values  different stopping criteria

15 11. 10. 2001.NIMIA Crema, Italy15 Linear combination of networks NN M NN 1 NN 2 α1α1 α2α2 αMαM Σ y1y1 y2y2 yMyM x NN M α 0 y 0 =1

16 11. 10. 2001.NIMIA Crema, Italy16 Linear combination of networks Computation of optimal coefficients   simple average , k depends on the input for different input domains different network (alone gives the output)  optimal values using the constraint  optimal values without any constraint Wiener-Hopf equation

17 11. 10. 2001.NIMIA Crema, Italy17 Task decomposition Decomposition related to learning  before learning (subtask definition)  during learning (automatic task decomposition) Problem space decomposition  input space (input space clustering, definition of different input regions)  output space (desired response)

18 11. 10. 2001.NIMIA Crema, Italy18 Task decomposition Decomposition into separate subproblems  K-class classification K two-class problems (coarse decomposition)  Complex two-class problems smaller two-class problems (fine decomposition)  Integration (module combination)

19 11. 10. 2001.NIMIA Crema, Italy19 Task decomposition A 3-class problem

20 11. 10. 2001.NIMIA Crema, Italy20 Task decomposition 3 classes 2 small classes

21 11. 10. 2001.NIMIA Crema, Italy21 Task decomposition 3 classes 2 classes 2 small classes

22 11. 10. 2001.NIMIA Crema, Italy22 Task decomposition 3 classes 2 small classes

23 11. 10. 2001.NIMIA Crema, Italy23 Task decomposition M 12 M 13 M 23 MIN C1C1 C2C2 C3C3 INV= Input

24 11. 10. 2001.NIMIA Crema, Italy24 Task decomposition A two-class problem decomposed into subtasks

25 11. 10. 2001.NIMIA Crema, Italy25 Task decomposition AND OR AND M 11 M 12 M 22 M 21

26 11. 10. 2001.NIMIA Crema, Italy26 Task decomposition M 11 M 21 MIN MAX MIN C1C1 Input M 12 M 22

27 11. 10. 2001.NIMIA Crema, Italy27 Task decomposition Training set decomposition:  Original training set  Training set for each of the (K) two-class problems  Each of the two-class problems are divided into K-1 smaller two- class problems [using an inverter module really (K-1)/2 is enough]

28 11. 10. 2001.NIMIA Crema, Italy28 Task decomposition input number 16 x 16 Normalization Edge detection horizontal diagonal \ diagonal / vertical Kirsch masks 4 16 x 16 feature maps 4 8 x 8 matrix input number 16 x 16 A practical example: Zip code recognition

29 11. 10. 2001.NIMIA Crema, Italy29 Task decomposition Zip code recognition (handwritten character recognition) modular solution 45 (K*K-1)/2 neurons 10 AND gates (MIN operator) 256+1 inputs

30 11. 10. 2001.NIMIA Crema, Italy30 Mixture of Experts (MOE) Expert 2Expert 1 Gating network μ 1 μ g1g1 g2g2 x Expert M gMgM Σ 

31 11. 10. 2001.NIMIA Crema, Italy31 Mixture of Experts (MOE) The output is the weighted sum of the outputs of the experts is the parameter of the i-th expert The output of the gating network: “softmax” function is the parameter of the gating network

32 11. 10. 2001.NIMIA Crema, Italy32 Mixture of Experts (MOE)  Probabilistic interpretation  The probabilistic model with true parameters  a priori probability

33 11. 10. 2001.NIMIA Crema, Italy33 Mixture of Experts (MOE) Training  Training data  Probability of generating output from the input  The log likelihood function (maximum likelihood estimation)

34 11. 10. 2001.NIMIA Crema, Italy34 Mixture of Experts (MOE) Training (cont’d)  Gradient method  The parameter of the expert network  The parameter of the gating network and

35 11. 10. 2001.NIMIA Crema, Italy35 Mixture of Experts (MOE) Training (cont’d)  A priori probability  A posteriori probability

36 11. 10. 2001.NIMIA Crema, Italy36 Mixture of Experts (MOE) Training (cont’d)  EM (Expectation Maximization) algorithm A general iterative technique for maximum likelihood estimation Introducing hidden variables Defining a log likelihood function  Two steps: Expectation of the hidden variables Maximization of the log likelihood function

37 11. 10. 2001.NIMIA Crema, Italy37 EM (Expectation Maximization) algorithm A simple example: estimating means of k (2) Gaussians f (y│µ 1 ) f (y│  2 ) Measurements

38 11. 10. 2001.NIMIA Crema, Italy38 EM (Expectation Maximization) algorithm A simple example: estimating means of k (2) Gaussians  hidden variables for every observation, (x(l), z i1, z i2 )  likelihood function  Log likelihood function  expected value of with given

39 11. 10. 2001.NIMIA Crema, Italy39 Mixture of Experts (MOE) A simple example: estimating means of k (2) Gaussians  Expected log likelihood function where  The estimate of the means

40 11. 10. 2001.NIMIA Crema, Italy40 Mixture of Experts (MOE) Applications  Simple experts: linear experts  ECG diagnostics  Mixture of Kalman filters Discussion: comparison to non-modular architecture

41 11. 10. 2001.NIMIA Crema, Italy41 Support vector machines A new approach: Gives answers for questions not solved using the classical approach  The size of the network  The generalization capability

42 11. 10. 2001.NIMIA Crema, Italy42 Classical neural learningSupport Vector Machine Support vector machines Optimal hyperplane Classification

43 11. 10. 2001.NIMIA Crema, Italy43 VC dimension

44 11. 10. 2001.NIMIA Crema, Italy44 Structural error minimization

45 11. 10. 2001.NIMIA Crema, Italy45 Support vector machines Linearly separable two-class problem  separating hyperpalne Optimal hyperplane

46 11. 10. 2001.NIMIA Crema, Italy46 Support vector machines x d(x)d(x) x1x1 x2x2 Geometric interpretation

47 11. 10. 2001.NIMIA Crema, Italy47 Support vector machines Criterion function, Lagrange function a constrained optimization problem conditions dual problem support vectorsoptimal hyperplane

48 11. 10. 2001.NIMIA Crema, Italy48 Support vector machines Linearly nonseparable case separating hyperplane criterion function Lagrange function support vectors optimal hyperplane Optimal hyperplane

49 11. 10. 2001.NIMIA Crema, Italy49 Support vector machines Nonlinear separation  separating hyperplane  decision surface  kernel function  criterion function

50 11. 10. 2001.NIMIA Crema, Italy50 Support vector machines Examples of SVM  Polynomial  RBF  MLP

51 11. 10. 2001.NIMIA Crema, Italy51 Support vector machines Example: polynomial  basis functions  kernel function

52 11. 10. 2001.NIMIA Crema, Italy52 Minimize: Constraint: Separable samples:Not separable samples: Constraint: Minimize: Where by minimizingwe maximize the distance of the classes, whilst we also control the VC dimension. SVR (classification)

53 11. 10. 2001.NIMIA Crema, Italy53 SVR (regression)  C()C() 

54 11. 10. 2001.NIMIA Crema, Italy54 SVR (regression) Constraints:Minimize:

55 11. 10. 2001.NIMIA Crema, Italy55 SVR (regression) Lagrange function dual problem constraints support vectors solution

56 11. 10. 2001.NIMIA Crema, Italy56 SVR (regression)

57 11. 10. 2001.NIMIA Crema, Italy57 SVR (regression)

58 11. 10. 2001.NIMIA Crema, Italy58 SVR (regression)

59 11. 10. 2001.NIMIA Crema, Italy59 SVR (regression)

60 11. 10. 2001.NIMIA Crema, Italy60 Support vector machines Main advantages  generalization  size of the network  centre parameters for RBF  linear-in-the-parameter structure  noise immunity

61 11. 10. 2001.NIMIA Crema, Italy61 Support vector machines Main disadavantages  computation intensive (quadratic optimization)  hyperparameter selection VC dimension (classification)  batch processing

62 11. 10. 2001.NIMIA Crema, Italy62 Support vector machines Variants  LS SVM  basic criterion function  Advantages: easier to compute  adaptivity,

63 11. 10. 2001.NIMIA Crema, Italy63 Mixture of SVMs Problem of hyper-parameter selection for SVMs  Different SVMs, with different hyper-parameters  Soft separation of the input space

64 11. 10. 2001.NIMIA Crema, Italy64 Mixture of SVMs

65 11. 10. 2001.NIMIA Crema, Italy65 Boosting techniques Boosting by filtering Boosting by subsampling Boosting by reweighting

66 11. 10. 2001.NIMIA Crema, Italy66 Boosting techniques Boosting by filtering

67 11. 10. 2001.NIMIA Crema, Italy67 Boosting techniques Boosting by subsampling

68 11. 10. 2001.NIMIA Crema, Italy68 Boosting techniques Boosting by reweighting

69 11. 10. 2001.NIMIA Crema, Italy69 Other modular architectures

70 11. 10. 2001.NIMIA Crema, Italy70 Other modular architectures

71 11. 10. 2001.NIMIA Crema, Italy71 Other modular architectures Modular classifiers  Decoupled modules  Hierarchical modules  Network ensemble (linear combination)  Network ensemble (decision, voting)

72 11. 10. 2001.NIMIA Crema, Italy72 Modular architectures


Download ppt "11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems."

Similar presentations


Ads by Google