Margin Based Sample Weighting for Stable Feature Selection Yue Han, Lei Yu State University of New York at Binghamton
Outline Introduction Related Work Hypothesis-Margin Feature Space Transformation Margin Based Sample Weighting Experimental Study Conclusion and Future Work
Introduction Features(Genes or Proteins) Samples p: # of features n: # of samples High-dimensional data: p >> n Feature Selection: Alleviating the effect of the curse of dimensionality. Enhancing generalization capability. Speeding up learning process. Improving model interpretability. High dimensional Data Dimension reduced Data Feature Selection (Filter or Wrapper) Learning Model D1D1 D2D2 Sports T 1 T 2 ….…… T N 12 0 ….…… 6 DMDM C Travel Jobs … …… Terms Documents 3 10 ….…… ….…… 16 …
Cont’s D1 D2 Features Samples Given Unlimited Sample Size of D: Feature selection results from D1 and D2 are the same Size of D is limited(n<<p for high dimensional data) Feature selection results from D1 and D2 are different Increasing #of samples could be very costly or impractical Stability of feature selection - the insensitivity of the result of a feature selection algorithm to variations in the training set. Identifying characteristic markers to explain the observed phenomena
Related Work Bagging-based Ensemble Feature Selection (Saeys et al. ECML07) Different bootstrapped samples of the same training set; Apply a conventional feature selection algorithm; Aggregates the feature selection results. Group-based Stable Feature Selection (Yu et al. KDD08, KDD09) Explore the intrinsic feature correlations; Identify groups of correlated features; Select relevant feature groups.
Hypothesis-Margin Feature Space Transformation A framework of margin based instance weighting for stable feature selection Introduce the concept of hypothesis-margin feature space; Propose the framework of margin based instance weighting for stable feature selection; Develop an efficient algorithm under the proposed framework.
Hypothesis-Margin Feature Space Transformation X’ captures the local profile of feature importance for all features at X. Multiple nearest neighbors can be used to compute the HM of a sample hit miss
Cont’s Hypothesis-margin based feature space transformation: (a) original feature space, and (b) hypothesis-margin (HM) feature space.
Margin Based Sample Weighting Discrepancy among samples w.r.t. their local profiles of feature importance(HM feature space) Measure the average distance of X’ to all other samples in the HM feature space and greater average distance indicates higher outlying degree. overall time complexity O(n 2 q) and n is the number of samples and q is the dimensionality of D.
Experimental Study Feature Ranking Feature Subset Selection Feature Correlation Stability of a feature selection algorithm is measured as the average of the pair-wise similarity of various feature selection results produced by the same algorithm from different training sets. Stability Metrics
Cont’s Experimental Setup SVM-RFE: 10 percent of remaining features eliminated at each iteration. En-RFE: 20 bootstrapped training sets to construct the ensemble. IW-RFE: k = 10 for hypothesis margin transformation. 10tims shuffling and 10 fold cross-validation to generate 100 datasets.
Consistent improvement in terms of stability of feature selection results from different stability measures
different feature selection algorithms can lead to similarly good classification results
Conclusion and Future Work Introduced the concept of hypothesis-margin feature space Proposed the framework of margin based sample weighting for stable feature selection Developed an efficient algorithm under the framework Investigate alternative methods of sample weighting based on HM feature space Strategies to combine margin based sample weighting with group-based stable feature selection
Questions? Thank you!