RBF TWO-STAGE LEARNING NETWORKS: EXPLOITATION OF SUPERVISED DATA IN THE SELECTION OF HIDDEN UNIT PARAMETERS An application to SAR data classification
Objectives Improve performance of RBF two- stage learning networks by introducing parameter adaptation criteria where supervised data are exploited to adjust hidden unit parameters (traditionally detected by unsupervised learning techniques)
Topics of discussion n Properties of RBF networks vs MLPs n One- and two-stage RBF network learning strategies n Recent developments in RBF two- stage learning algorithms (Bruzzone, IEEE TGARS, 1999) n Classification examples
RBF network MLP network (hard threshold perceptron)
MLP network n Advantages u distribution-free u importance-free (data fusion) u one-stage supervised learning algorithm (BP) n Disadvantages u slow to train u convergence to a local minimum u high output responses to input data that fall into regions of the input space where there are no training examples (extrapolation) u sensitive to outliers that affect every free parameters u network topology is not data-driven (model selection)
RBF network n Advantages u simple two-layer architecture to solve complex tasks u two-stage hybrid (supervised 1st stage + unsupervised 2nd stage) learning scheme F fast to train F closed form linear optimization of the output weights u localized BFs F low output responses to input data that fall into regions of the input space where there are no training examples F learning of each input sample affects a specialized subset of the network parameters (modularization, outliers do not affect every free parameter) F easy interpretation of the processing units
RBF network n Disadvantages u the classification error strongly depends on the selection of the number, centers and widths of BFs u molteplicity of one-stage (supervised [error-driven]) and two-stage (supervised, hybrid [error-driven + data- driven]) learning algorithms
RBF network learning strategies n One-stage supervised (error-driven) learning u GBF sum-of-squared error gradient descent (Bishop, 1995) F Disadvantages GBFs may not stay localized ( ) no effect on the positions (centers) of GBFs no model selection u new types of BFs suitable for gradient descent learning (Karayiannis, IEEE TNN, 1998) F Disadvantages no model selection u constructive learning (Fritzke, IEEE TNN, 1994) F Disadvantages unlimited growing unstable (small input variations cause large output changes)
RBF network learning strategies n Two-stage learning u hybrid learning (Moody and Darken, 1989) F first stage (hidden layer): data-driven BF centers –clustering BF spread parameters –p-nearest neighbor heuristic F second stage: error-driven gradient descent pseudo-inverse linear optimization (may be unstable) majority voting
RBF network learning strategies n Two-stage learning u hybrid learning (Moody and Darken, 1989) F Disadvantages no model selection (number of BFs) mixed clusters: unsupervised learning does not reflect the local complexity of the classification problem at hand if the number of BFs increases, then there is no guarantee of improving the system’s performance
RBF network learning strategies n Two-stage learning u constructive (Karayiannis, 1998): error-driven location of new BFs F Disadvantages only one unit is inserted per two-stage growing cycle u supervised learning F first stage (hidden layer) BF centers –Karayiannis, 1998: gradient descent of the “localized class- conditional activation variance” (LOCCAV) method –Bruzzone, IEEE TGARS, 1999: class-conditional (constructive) clustering BF spread parameters –Karayiannis: LOCCAV or “localized class-conditional quantization error” (LOCCEQ) –Bruzzone: class-conditional p-nearest neighbor heuristic
RBF network learning strategies n Two-stage learning u supervised learning F first stage (hidden layer) BF spread parameters –Bruzzone: class-conditional p-nearest neighbor heuristic. Given one BF center i, which belongs to a class, if the 3- nearest BF centers h,,, k and m belong to the same class then less conservative choice: = (d( i, h ) + d( i, k )) / 2 otherwise more conservative choice: LOCCEQ
Bruzzone’s RBF network two- stage learning strategy SAR ERS-1/ERS-2 tandem pair data classification task (IGARSS 2000)
n improvement in classification performance n enhanced performance stability with respect to changes in the number of hidden units n integration of class-conditional data with constructive unsupervised learning techniques to address the model selection issue Conclusions Simple heuristic techniques exploiting supervised data in learning hidden parameters of an RBF two- stage learning network may lead to: