Presentation is loading. Please wait.

Presentation is loading. Please wait.

Feature Selection Analysis

Similar presentations


Presentation on theme: "Feature Selection Analysis"— Presentation transcript:

1 Feature Selection Analysis
An attempt at a generalized relationship between sample size and dimensionality Project for 9.520 Nathan Eagle

2 Motivation Expense of taking/labeling additional sample data
How much training data is really necessary?

3 Empirical Evidence – “s-curve”
SVM(fu) Classifier on Sayan’s Feature Selection Technique

4 Empirical Evidence – linearity
Linear relationship between samples and dimensions

5 Proof I – Hypothesis Testing
(1) (2) But what are the priors – pfeat? What if there are more than 1 relevant feature?

6 Proof II – Chebyshev and Weak Law of Large Numbers
(3) From W.L.L.N.: (4) From Chebyshev’s inequality: (5)

7 Proof II (cont) From before: (6) Inversing the probability: (7)
For all features: (8)

8 Proof II (cont) From before: (9) setting: Sample Size vs. Dimensions
Irrelevant Dimensions/Features Training Sample Size

9 Proof III – Sayan’s Generalization Error Algorithm
Generalization Error for two classes drawn from Gaussian distributions:* (10) Where the separating hyperplane is define as: (11) Fisher Linear Discriminant * As proved in Sayan Mukherjee’s PhD thesis

10 Results THEORITICAL EMPERICAL 1 iteration 50 iterations

11 Conclusions Sample size seems to scale linearly with irrelevant features both empirically and theoretically – regardless of the classifier. The ‘s-curve’ does not seems to be a generalized property of all feature selection methods


Download ppt "Feature Selection Analysis"

Similar presentations


Ads by Google