Feature Selection Analysis An attempt at a generalized relationship between sample size and dimensionality Project for 9.520 Nathan Eagle
Motivation Expense of taking/labeling additional sample data How much training data is really necessary?
Empirical Evidence – “s-curve” SVM(fu) Classifier on Sayan’s Feature Selection Technique
Empirical Evidence – linearity Linear relationship between samples and dimensions
Proof I – Hypothesis Testing (1) (2) But what are the priors – pfeat? What if there are more than 1 relevant feature?
Proof II – Chebyshev and Weak Law of Large Numbers (3) From W.L.L.N.: (4) From Chebyshev’s inequality: (5)
Proof II (cont) From before: (6) Inversing the probability: (7) For all features: (8)
Proof II (cont) From before: (9) setting: Sample Size vs. Dimensions Irrelevant Dimensions/Features Training Sample Size
Proof III – Sayan’s Generalization Error Algorithm Generalization Error for two classes drawn from Gaussian distributions:* (10) Where the separating hyperplane is define as: (11) Fisher Linear Discriminant * As proved in Sayan Mukherjee’s PhD thesis
Results THEORITICAL EMPERICAL 1 iteration 50 iterations
Conclusions Sample size seems to scale linearly with irrelevant features both empirically and theoretically – regardless of the classifier. The ‘s-curve’ does not seems to be a generalized property of all feature selection methods