Download presentation
Presentation is loading. Please wait.
1
Graph-based Iterative Hybrid Feature Selection Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡ IBM T. J. Watson Research Center # Montclair State University $ Xavier University of Louisiana
2
Where we are Supervised Feature Selection Unsupervised Feature Selection Semi-supervised Feature Selection Hybrid: Supervised to include key features Improve with semi-supervised approach
3
Supervised Feature Selection sample selection bias problem Only feature 2 will be selected, but feature 1 is also useful!
4
Toy example (1) Labeled data A(1,1,1,1;red) B(1,-1,1,-1;blue) Unlabeled data C(0,1,1,1;red) D(0,-1,1,1;red) Both feature 2 & 4 are correlated to class based on A and B. They are selected by supervised fs.
5
Semi-supervised Feature Selection
6
Toy example (2) A semi-supervised approach “ Spectral Based Feature Selection ”. Features are ranked according to the smoothness between data points and consistency with label information. Feature 2 will be selected if only one feature is desired.
7
Solution Hybrid Labeled data insufficient Sample selection bias Supervised fail Unlabeled data indistinct Data from different class are not separated Semi-supervised fail
8
Hybrid Feature Selection [IteraGraph_FS]
9
Toy example (3)
10
Properties of feature selection The distance between any two examples is approximately the same under the high- dimension feature space. [Theorem 3.1] Feature selection can obtain a more distinguishable distance measure which lead to a better confidence estimate. [Theorem 3.2]
11
Theorems 3.1 and 3.2 3.1 Dimensionality increases Nearest neighbor approaches the farthest neighbor 3.2 More distinguishable similarity measure Better classification confidence matrix
12
Semi-supervised Feature Selection Graph-based [Label Propagation] Expand the labeled set by adding unlabeled data and their prediction labels which have high confidence (s%). Perform feature selection on the new labeled set.
13
Confidence and Margin (Lemma 3.2)
14
Selection Strategy Comparison ( Theorem 3.3 )
15
Experiments setup Data Set Handwritten Digit Recognition Problem Biomedical and Gene Expression Data Text Documents [Reuters-21578] Comparable Approach Supervised Feature selection: SFFS Semi-supervised approach: sSelect [SDM07]
16
Data Set -- Description
17
Feature Quality Study
18
Conclusions Labeled information Critical features, better confidence estimates Unlabeled data Improve this chosen feature set Flexible Can incorporate many feature selection methods which aim at revealing the relationship between data points.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.