Presentation is loading. Please wait.

Presentation is loading. Please wait.

Graph-based Iterative Hybrid Feature Selection Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡

Similar presentations

Presentation on theme: "Graph-based Iterative Hybrid Feature Selection Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡"— Presentation transcript:

1 Graph-based Iterative Hybrid Feature Selection Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡ IBM T. J. Watson Research Center # Montclair State University $ Xavier University of Louisiana

2 Where we are Supervised Feature Selection Unsupervised Feature Selection Semi-supervised Feature Selection Hybrid: Supervised to include key features Improve with semi-supervised approach

3 Supervised Feature Selection sample selection bias problem Only feature 2 will be selected, but feature 1 is also useful!

4 Toy example (1) Labeled data A(1,1,1,1;red) B(1,-1,1,-1;blue) Unlabeled data C(0,1,1,1;red) D(0,-1,1,1;red) Both feature 2 & 4 are correlated to class based on A and B. They are selected by supervised fs.

5 Semi-supervised Feature Selection

6 Toy example (2) A semi-supervised approach “ Spectral Based Feature Selection ”. Features are ranked according to the smoothness between data points and consistency with label information. Feature 2 will be selected if only one feature is desired.

7 Solution  Hybrid Labeled data insufficient  Sample selection bias  Supervised fail Unlabeled data indistinct  Data from different class are not separated  Semi-supervised fail

8 Hybrid Feature Selection [IteraGraph_FS]

9 Toy example (3)

10 Properties of feature selection The distance between any two examples is approximately the same under the high- dimension feature space. [Theorem 3.1] Feature selection can obtain a more distinguishable distance measure which lead to a better confidence estimate. [Theorem 3.2]

11 Theorems 3.1 and 3.2 3.1 Dimensionality increases  Nearest neighbor approaches the farthest neighbor 3.2 More distinguishable similarity measure  Better classification confidence matrix

12 Semi-supervised Feature Selection Graph-based [Label Propagation] Expand the labeled set by adding unlabeled data and their prediction labels which have high confidence (s%). Perform feature selection on the new labeled set.

13 Confidence and Margin (Lemma 3.2)

14 Selection Strategy Comparison ( Theorem 3.3 )

15 Experiments setup Data Set Handwritten Digit Recognition Problem Biomedical and Gene Expression Data Text Documents [Reuters-21578] Comparable Approach Supervised Feature selection: SFFS Semi-supervised approach: sSelect [SDM07]

16 Data Set -- Description

17 Feature Quality Study

18 Conclusions Labeled information  Critical features, better confidence estimates Unlabeled data  Improve this chosen feature set Flexible Can incorporate many feature selection methods which aim at revealing the relationship between data points.

Download ppt "Graph-based Iterative Hybrid Feature Selection Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡"

Similar presentations

Ads by Google