Random Subspace Feature Selection for Analysis of Data with Missing Features Presented by: Joseph DePasquale Student Activities Conference 2007 This material is based upon work supported by the National Science Foundation under Grant No ECS Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Outline Motivation Motivation Missing feature algorithm Missing feature algorithm Selecting features for trainingSelecting features for training Finding usable classifiers for testingFinding usable classifiers for testing Impact of free parameters Impact of free parameters Number of features used for trainingNumber of features used for training Distribution update parameter βDistribution update parameter β
Motivation Missing data is a real world issue Missing data is a real world issue Failed equipmentFailed equipment Human errorHuman error Natural phenomenaNatural phenomena Matrix multiplication can not be used if a single data value is left out Matrix multiplication can not be used if a single data value is left out Missing Feature
Training
Training Xfifi Feature not used in trainingFeature used in training CiCi Usable classifier Usable Classifiers
Experimental Setup Research has been done for static selection of features used for training Research has been done for static selection of features used for training Dataset (f)Nof 1 nof 2 nof 3 nof 4 T VOC (12) PEN (16) ION (33) WBC (30)
Volatile Organic Compound Database
Pen Digits Recognition Database
Ionosphere Database
Wisconsin Breast Cancer Database
Conclusions β does not significantly impact the algorithm, the number of features used for training does have an impact β does not significantly impact the algorithm, the number of features used for training does have an impact
References [1]Hussein, S., “Random feature subspace ensemble based approaches for the analysis of data with missing features,” Submitted Spring [2] Haykin, S., “Neural Networks A Comprehensive Foundation,” New Jersey: Prentice Hall, [3] “UCI repository,” [Online Document], Accessed: 25 Nov
Learn ++.MF Training Training Selecting features from distributionSelecting features from distribution Training the networkTraining the network Update likelihood of selecting featuresUpdate likelihood of selecting features Testing Testing Data corruptionData corruption Identify usable classifiersIdentify usable classifiers SimulationSimulation