Daniel J. Garcia, Mentors: Dr. Lawrence Hall, Dr. Dmitry Goldgof, Kurt Kramer Start Finish Generate 200 random feature sets Run 10 fold cross validation on each set random set Sort results of 10-fold cross validation by training time Select 9 fastest random feature sets Create three new sets: Union fastest 3, union fastest 5 and Union Fastest 9 Feature selection methods are used to find the set of features that yield the best classification accuracy for a given data set. This results in better training and classification time for a classifier, in addition to better classification accuracy. Feature selection, however, is a time consuming process unfit for real time applications. Introduction Random Sets Method Flowchart Conclusion As has been shown, using random feature sets as a feature selection tool provides benefits for learning algorithms. Real time application is one of the greatest benefits, perhaps allowing for a limited feature selection algorithm to be run as new data is gathered. The random sets approach is fast, is very accurate in certain situations, and takes great advantage of parallel processing Results (Comparison between the Random Sets Method and the well-known Wrappers Method) Accuracy Comparison: 3.4% less accurate than the best achieved accuracy 1.96% less accurate than the best achieved accuracy 1.54% less accurate than the best achieved accuracy Testing the Hypothesis - Features from the fastest random sets are unequivocally better than features from the slower sets. This supports our hypothesis. - The superiority of the features is clearly seen by the comparing the union of 3 sets. The union of the fastest 3 sets is more accurate than the union of the slowest 3 sets, despite having less features to work with. Speed Comparison : - Random Sets method is considerably faster than the Wrapper Approach. - The average feature selection time between the Random Sets method and the Wrapper Method is 2 hours and 11 minutes. The Big Picture References Tong Luo, Kurt Kramer, Dmitry B. Goldgof, Lawrence O. Hall, Scott Samson, Andrew Remsen, Thomas Hopkins, Recognizing Plankton from Shadow Image Particle Profiling Evaluation Recorder, IEEE trans. on system, man and cybernetics-part B: cybernetics, August 2004, vol. 34, no. 4. Samson, S., Hopkins, T., Remsen, A., Langebrake, L., Sutton, T., Patten, J., A system for high resolution zooplankton imaging. IEEE Journal of Oceanic Engineering 26 (4), pages Ron Kohavi and George H. John, Wrappers for Feature Subset Selection, Artificial Intelligence archive, December 1997, vol. 97, pages Kurt A. Kramer, Identifying Plankton from Grayscale Silhouette Images, Master Thesis USF, October Chih-Chung Chang and Chih-Jen Lin, A Library for Support Vector Machines, libsvm, T. Lou, K. Kramer, D. Goldgof, L. Hall, S. Sampson, A. Remsen, T. Hopkins, "Active Learning to Recognize Multiple Types of Plankton", International Conference on Pattern Recognition (ICPR), Cambridge, UK, August Department of Computer Science & Engineering REU 2006-Feature Selection Algorithm from Random Subsets