Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Vision Laboratory 1 Hiperlearn: A Channel Learning Architecture Gösta Granlund Computer Vision Laboratory Linköping University SWEDEN.

Similar presentations


Presentation on theme: "Computer Vision Laboratory 1 Hiperlearn: A Channel Learning Architecture Gösta Granlund Computer Vision Laboratory Linköping University SWEDEN."— Presentation transcript:

1 Computer Vision Laboratory 1 Hiperlearn: A Channel Learning Architecture Gösta Granlund Computer Vision Laboratory Linköping University SWEDEN

2 Computer Vision Laboratory 2 Requirements on systems A large number of input and output variables A highly non-linear mapping, which in parts is locally continuous and in other parts transiential switching A mapping which requires the interpolation between a large number of models

3 Computer Vision Laboratory 3 Challenge It would be of great benefit if complex procedures could be designed through learning of the complex and usually non- linear relationships.

4 Computer Vision Laboratory 4 Architecture A learning system architecture using a channel information representation This representation implies that signals are monopolar and local Monopolar means that data only utilizes one polarity, allowing zero to represent no information

5 Computer Vision Laboratory 5 Architecture II Locality derives from a partially overlapping mapping of signals into a higher-dimensional space The combination of monopolarity and locality leads to a sparse representation Locality of features allows the generation of highly non-linear functions using a linear mapping.

6 Computer Vision Laboratory 6 The locality obtained in this channel representation gives two advantages: Nonlinear functions and combinations can be implemented using linear mappings Optimization in learning converges much faster

7 Computer Vision Laboratory 7 Overview

8 Computer Vision Laboratory 8 State mapping

9 Computer Vision Laboratory 9 Mapping

10 Computer Vision Laboratory 10

11 Computer Vision Laboratory 11

12 Computer Vision Laboratory 12

13 Computer Vision Laboratory 13

14 Computer Vision Laboratory 14 Mapping

15 Computer Vision Laboratory 15 System operation modes The channel learning architecture can be run under two different operation modes, providing output in two different representations: Position encoding for discrete event mapping Magnitude encoding for continuous function mapping

16 Computer Vision Laboratory 16 Discrete event mapping

17 Computer Vision Laboratory 17 Discrete event mapping There are a number of characteristics of this mode: Mapping is made to sets of output channels, whose response functions may be partially overlapping to allow the reconstruction of a continuous variable. Output channels are assumed to assume some standard maximum value, say 1, but are expected to be zero most of the time, to allow a sparse representation.

18 Computer Vision Laboratory 18 Discrete event mapping II The system state is not given by the magnitude of a single output channel, but is given by the relation between outputs of adjacent channels. Relatively few feature functions, or sometimes only a single feature function, are expected to map onto a particular output channel.

19 Computer Vision Laboratory 19 Discrete event mapping III The channel representation of a signal allows a unified representation of signal value and of signal confidence, where the relation between channel values represents value, and the magnitude represents confidence. Since the discrete event mode implies that both the feature and response state vectors are in the channel representation, the confidence of the feature vector will be propagated to the response vector if the mapping is linear.

20 Computer Vision Laboratory 20 The properties just listed, allows the structure to be implemented as a purely linear mapping:

21 Computer Vision Laboratory 21 Continuous function mapping

22 Computer Vision Laboratory 22 Continuous function mapping There are a number of characteristics of this mode: It uses rather complete sets of feature functions, compared to the mapping onto a response mapping in discrete event mode. The structure can still handle local feature dropouts without adverse effects upon well behaved regions.

23 Computer Vision Laboratory 23 Continuous function mapping II Mapping is made to continuous output response variables, which may have a magnitude which varies over a large range. A high degree of accuracy in the mapping can be obtained if the feature vector is normalized, as stated below.

24 Computer Vision Laboratory 24 In this mode, however, it is not possible to represent both a state value x, and a confidence measure r, except if it is done explicitly.

25 Computer Vision Laboratory 25 Confidence measure For the confidence measure a linear model is assumed,

26 Computer Vision Laboratory 26 The model for this mode comes out as

27 Computer Vision Laboratory 27 Overview of operation modes Although the two modes use a similar mapping, there are distinctive differences, as illustrated in table next slide

28 Computer Vision Laboratory 28 Overview of operation modes

29 Computer Vision Laboratory 29 Comparison between modes The major difference is in how they deal with zero, or more properly with no information and with interpolation. This depends upon whether the output is a channel vector or not. If it is, position encoding will be used with use of confidence. This should be viewed as the normal mode of operation for this architecture, as this forms the appropriate representation for continued processing at a next level.

30 Computer Vision Laboratory 30 A conversion back to scalars, is used mainly for easier interpretation of the states or for control of some linear actuator.

31 Computer Vision Laboratory 31 Associative structure

32 Computer Vision Laboratory 32 Training setup

33 Computer Vision Laboratory 33 Batch mode training

34 Computer Vision Laboratory 34

35 Computer Vision Laboratory 35 Regularization properties An unrestricted least squares solution tends to give a full matrix with negative and positive coefficients. These coefficients do their best to push and pull the basis functions to minimize the error for the particular training set, a.k.a. overfitting.

36 Computer Vision Laboratory 36 In many situations a linkage matrix with a small norm is preferred, e.g. in Tikhonov regularization, because a small norm reduces the global noise propagation.

37 Computer Vision Laboratory 37 It is desirable that a large error, due to drop- outs, heavy noise etc., in one region of the state space should not affect the performance in other, sufficiently distant regions of the state space. This can be achieved using a non-negativity constraint together with locality.

38 Computer Vision Laboratory 38 While non-negativity does not give the smallest possible error on the training set, nor the smallest norm of the linkage matrix, it acts as a regularization. The non-negativity constraint will be referred to as monopolar regularization.

39 Computer Vision Laboratory 39 Monopolarity in addition gives a more sparse linkage matrix, allowing a faster processing.

40 Computer Vision Laboratory 40 Tikhonov regularization

41 Computer Vision Laboratory 41 Monopolar regularization

42 Computer Vision Laboratory 42 Tikhonov versus monopolar

43 Computer Vision Laboratory 43 Properties Tikhonov Tikhonov regularization tends to produce non-sparse, non-local, solutions which give a computationally more complex system. We can also observe ringing effects which typically occur for linear methods.

44 Computer Vision Laboratory 44 Properties monopolar The non-negativity constraint on the other hand gives a solution with a slightly larger norm, but it is both sparse and local. Note that only two elements in c mono are non-zero.

45 Computer Vision Laboratory 45 Loosely coupled subsystems The degree to which variables are shared among subsystems can be expressed as a state distance or coupling metric between subsystems. Subsystems at a larger distance according to this metric will not affect each other, e.g. in a process of optimization.

46 Computer Vision Laboratory 46 Full versus sparse local matrix

47 Computer Vision Laboratory 47 Rearranged sparse local matrix

48 Computer Vision Laboratory 48 Locality of models The local behavior of the feature and response channels, together with the monopolar constraint on the linkage matrix, implies that the solution for one local region of the response space does not affect the solution in other, sufficiently distant, regions of the response space.

49 Computer Vision Laboratory 49 This property is also essential as we want to add new features after training the system, such as in the implementation of an incremental learning procedure. A new feature that is never active within the active domain of a response channel will not affect the previously learned links to that response channel.

50 Computer Vision Laboratory 50 Loose coupling leads to fast convergence Another important advantage with loosely coupled subsystems structures is that the iterative procedure to compute the model parameters, in this case the linkage matrix, exhibits fast convergence.

51 Computer Vision Laboratory 51 Optimization C is computed as the solution to the constrained weighted least-squares problem where

52 Computer Vision Laboratory 52 Weight matrix The weight matrix W, which controls the relevance of each sample, is chosen The minimization problem does not generally have a unique solution, as it can be under-determined or over-determined.

53 Computer Vision Laboratory 53 Normalization This can be rewritten as We can interpret D f and D f as normalizations in the sample and feature domain respectively. We will refer to these as sample domain normalization and feature domain normalization

54 Computer Vision Laboratory 54 Solution for linkage matrix The solution is derived from the projected Landweber method

55 Computer Vision Laboratory 55 Associative structure

56 Computer Vision Laboratory 56 Separability of solution Note that the structure implies independent optimization of each response, i.e. the optimization is performed locally in the response domain. Hence we may, as an equivalent alternative, optimize each linkage vector separately as

57 Computer Vision Laboratory 57 Removal of small coefficients in C Coefficients of matrix C which are below a certain threshold C min shall be eliminated altogether, as this gives a more sparse matrix. The thresholding is implemented repeatedly during the iterative procedure. The increase of error on new data turns out to be minimal, as the iterative optimization will compensate for this removal of coefficients.

58 Computer Vision Laboratory 58 Coefficients of C Furthermore, the coefficients of matrix C shall be limited in magnitude, as this gives a more robust system This can be achieved by using a constraint C<C min The assumption is that significant feature values, a h shall be within some standard magnitude range, and consequently the value of non-zero a h shall be above some minimum value a min

59 Computer Vision Laboratory 59 Normalization modes The normalization can be viewed as a way to control the gain in the feedback system loop, which the iterative optimization procedure implies, This normalization can be put in either of the two representation domains, the sample domain or the feature domain, but with different effects upon convergence,accuracy, etc. For each choice of sample domain normalization D s there are non-unique choices of feature domain normalizations D f

60 Computer Vision Laboratory 60 Normalization modes The choice of normalization depends on the operation mode, i.e. continuous function mapping or discrete event mapping. There are some choices that are of particular interest.

61 Computer Vision Laboratory 61 Associative structure

62 Computer Vision Laboratory 62 Discrete event mapping Discrete event mode corresponds to a sample domain normalization matrix This gives

63 Computer Vision Laboratory 63 Continuous function mapping One approach is to assume that all training samples have the same confidence, i.e s=1, and compute and such that

64 Computer Vision Laboratory 64 We can use the sum of the feature channels as a measure of confidence:

65 Computer Vision Laboratory 65 Mixed domain normalization The corresponding sample domain normalization matrix is

66 Computer Vision Laboratory 66 Mixed domain normalization And the feature domain normalization matrix becomes

67 Computer Vision Laboratory 67 It can also be argued that a feature element which is frequently active should have a higher confidence than a feature element which is rarely active. This can be included in the confidence measure by using a weighted sum of the features, where the weight is proportional to the mean in the sample domain:

68 Computer Vision Laboratory 68 Sample domain normalization This corresponds to the sample domain normalization matrix which gives

69 Computer Vision Laboratory 69 Sample domain normalization This corresponds to the sample domain normalization matrix which gives

70 Computer Vision Laboratory 70 Summary of normalization modes

71 Computer Vision Laboratory 71 Sensitivity analysis for continuous function mode A response state estimate is generated from a feature vector a according to the model Regardless of choice of normalization vector w T the response will be independent of any global scaling of the features, i.e.

72 Computer Vision Laboratory 72 If multiplicative noise is applied, represented by a diagonal matrix, we get

73 Computer Vision Laboratory 73 Experimental verification A generalization of the common CMU twin spiral pattern has been used, as this is often used to evaluate classification networks. We have chosen to make the pattern more difficult in order to show that the proposed learning machinery can represent both continuous function mappings (regression) and mappings to discrete classes (classification).

74 Computer Vision Laboratory 74 Test pattern

75 Computer Vision Laboratory 75 Pattern selected to demonstrate the following properties: The ability to approximate piecewise continuous surfaces. The ability to describe discontinuities (i.e. assignment into discrete classes). The transition between interpolation and representation of a discontinuity. The inherent approximation introduced by the sensor channels.

76 Computer Vision Laboratory 76 Noise immunity test The robustness is analyzed with respect to three types of noise on the feature vector: –Additive noise –Multiplicative noise –Impulse noise

77 Computer Vision Laboratory 77 Experimental setup In the experiments, a three dimensional state space is used. The sensor space and the response space are orthogonal projections of the state space.

78 Computer Vision Laboratory 78 Test image function Note that this mapping can be seen as a surface of points with The analytic expression for is:

79 Computer Vision Laboratory 79 Associative network variants 1.Mixed domain normalization bipolar network 2.Mixed domain normalization monopolar network 3.Sample domain normalization monopolar network 4.Uniform sample confidence monopolar network 5.Discrete event mode monopolar network

80 Computer Vision Laboratory 80 Performance of bipolar network (#1) under varied number of samples. Top left to bottom right: N=63, 125, 250, 500, 1000, 2000, 4000, 8000

81 Computer Vision Laboratory 81

82 Computer Vision Laboratory 82 Performance of discrete event network (#5) under varied number of samples. Top left to bottom right: N=63, 125, 250, 500, 1000, 2000, 4000, 8000

83 Computer Vision Laboratory 83 Performance of discrete event network (\#5) under varied number of channels. Top left to bottom right: K=3 to K=14

84 Computer Vision Laboratory 84 Number of non-zero coefficients under varied number of channels.


Download ppt "Computer Vision Laboratory 1 Hiperlearn: A Channel Learning Architecture Gösta Granlund Computer Vision Laboratory Linköping University SWEDEN."

Similar presentations


Ads by Google