Download presentation
Presentation is loading. Please wait.
1
Measuring the Stability of Feature Selection
Sarah Nogueira and Gavin Brown School of Computer Science University of Manchester
2
Curse of dimensionality Lack of interpretability
Learning Algorithm LOTS OF FEATURES! Predictive model Too expensive Curse of dimensionality Lack of interpretability
3
Analogous concept of variance for feature sets
BIG DATA Feature Selection FEATURE SET Learning Algorithm Predictive model Stability πππΈ=πππ π 2 +π£ππ Analogous concept of variance for feature sets Sensitivity of the selected features to small perturbations in the data
4
Stability A recent and growing area of research
Indicator of reproducible research In biomarker identification - stability is said to be as important as predictive power - Instability has been a major obstacle to clinical applications
5
Literature to quantify stability!
Jaccard (2002) POG (2006) Hamming (2007) Kuncheva (2007) Krizek (2007) Dice (2008) nPOG (2009) Lustgarten (2009) CWrel (2010) Wald (2013) Lots of definitions Some statistical, some heuristics Conflicting opinions Different use cases
6
Measuring Stability Ξ¦ DATA
Sample 1 Sample 2 Sample M Feature set s1 Feature set s2 Feature set sM Feature Selection Stability Ξ¦ DATA Feature Selection Feature Selection ππ‘ππππππ‘π¦= Ξ¦ ( π 1 ,β¦, π π )= 1 π(πβ1) π πβ π π ππ( π π , π π )
7
such that values are interpretable and comparable?
What properties/behaviours should a measure have, such that values are interpretable and comparable? Jaccard (2002) POG (2006) Hamming (2007) Kuncheva (2007) Krizek (2007) Dice (2008) nPOG (2009) Lustgarten (2009) CWrel (2010) Wald (2013)
8
11100 00101 Imagine we had d featuresβ¦. binary string of length d.
Select features 1,2,3 00101 Select features 3,5
9
Imagine we had d featuresβ¦. binary string of length d.
11100 10110 00111 . 11010 Select EXACTLY 3 features, M times. Binary matrix M x d
10
β Defined for all possible feature sets
Desirable property 1: Fully defined 11100 00110 00111 . 10010 Sometimes select 2. Sometimes select 3. Not all measures work in this scenario! β Defined for all possible feature sets
11
Ξ¦ should be bounded by constants
Desirable property 2: Bounds Fully stable Random Ξ¦ should be bounded by constants
12
Desirable property 3: Maximum
π΄= π΅= Lustgarten (2009) measure Ξ¦ π΄ =0.6 Ξ¦ π΅ =0.8 All feature sets are identical β Ξ¦ reaches its maximum
13
Desirable property 3: Maximum
π΄= Wald (2013) and CWrel (2010): Ξ¦ reaches its maximal value of 1 Ξ¦ reaches its maximum β All feature sets are identical
14
Desirable property 4: Correction for chance
000β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦00 1 1 110 000 π¬ π± =π when the selection is random
15
Properties Fully defined Bounds Maximum Correction For Chance
Jaccard (2002) ο POG (2006) Hamming (2007) Kuncheva (2007) Krizek (2007) Dice (2008) nPOG (2009) Lustgarten (2009) CWrel (2010) Wald (2013)
16
Properties Fully defined Bounds Maximum Correction For Chance
Jaccard (2002) ο POG (2006) Hamming (2007) Kuncheva (2007) Krizek (2007) Dice (2008) nPOG (2009) Lustgarten (2009) CWrel (2010) Wald (2013) Pearson
17
Average Pairwise Pearsonβs correlation
Ξ¦ =1 Ξ¦ =0.58 Ξ¦ =0 Fully stable Random selection
18
Experiments Use L1 regularized logistic regression with regularizing parameter π Can we increase stability without loss of accuracy? OPTIMAL PARETO TRADE-OFF ο Minimal error ο Maximal stability Selection of a regularizing parameter that improves stability without loss of predictive power
19
Conclusions Increasing stability brings more confidence in
the features selected in the model. Pearsonβs correlation can do the job, having all desirable properties. Implementation in Matlab available online at:
20
Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.