Data Visualization and Feature Selection: New Algorithms for Nongaussian Data Howard Hua Yang and John Moody NIPS ’ 99
Contents Data visualization Good 2-D projections for high dimensional data interpretation Feature selection Eliminate redundancy Joint mutual information ICA
Introduction Visualization of input data and feature selection are intimately related. Input variable selection is the most important step in the model selection process. Model-independent approaches to select input variables before model specification. Data visualization is very important for human to understand the structural relation among variables in a system.
Joint mutual information for input/feature selection Mutual information Kullback-Leibler divergence Joint mutual information
Conditional MI When Use joint mutual information instead of the mutual information to select inputs for a neural network classifier and for data visualization.
Data visualization methods Supervised methods based on JMI cf) CCA Unsupervised methods based on ICA cf) PCA Efficient method for JMI
Application to Signal Visualization and Classification JMI and visualization of radar pulse patterns Radar pattern 15-dimensional vector, 3 classes Compute JMIs, select inputs
Radar pulse classification 7 hidden units Experiments all inputs vs. 4 selected inputs 4 inputs with the largest JMI vs. randomly selected 4 inputs
Conclusions Advantage of single JMI Can distinguish inputs when all of them have the same Can eliminate the redundancy in the inputs when one input is a function of other inputs