Download presentation
Presentation is loading. Please wait.
Published byMoris Martin Modified over 8 years ago
1
Lecture 5: Causality and Feature Selection Isabelle Guyon isabelle@clopinet.com
2
Variable/feature selection Remove features X i to improve (or least degrade) prediction of Y. X Y
3
What can go wrong? Guyon-Aliferis-Elisseeff, 2007
4
What can go wrong?
5
Guyon-Aliferis-Elisseeff, 2007 X2X2 Y X1X1 Y X1X1 X2X2
6
X Y Causal feature selection Uncover causal relationships between X i and Y. Y
7
Lung cancer Causal feature relevance
8
Lung cancer Causal feature relevance
9
Lung cancer Causal feature relevance
10
Lung cancer Markov Blanket Strongly relevant features (Kohavi-John, 1997) Markov Blanket (Tsamardinos-Aliferis, 2003)
11
Feature relevance Surely irrelevant feature X i : P(X i, Y |S \i ) = P(X i |S \i )P(Y |S \i ) for all S \i X \i and all assignment of values to S \i Strongly relevant feature X i : P(X i, Y |X \i ) P(X i |X \i )P(Y |X \i ) for some assignment of values to X \i Weakly relevant feature X i : P(X i, Y |S \i ) P(X i |S \i )P(Y |S \i ) for some assignment of values to S \i X \i
12
Lung cancer Markov Blanket Strongly relevant features (Kohavi-John, 1997) Markov Blanket (Tsamardinos-Aliferis, 2003)
13
Lung cancer Strongly relevant features (Kohavi-John, 1997) Markov Blanket (Tsamardinos-Aliferis, 2003) PARENTS Markov Blanket
14
Lung cancer Strongly relevant features (Kohavi-John, 1997) Markov Blanket (Tsamardinos-Aliferis, 2003) CHILDREN Markov Blanket
15
Lung cancer Strongly relevant features (Kohavi-John, 1997) Markov Blanket (Tsamardinos-Aliferis, 2003) SPOUSES Markov Blanket
16
Causal relevance Surely irrelevant feature X i : P(X i, Y |S \i ) = P(X i |S \i )P(Y |S \i ) for all S \i X \i and all assignment of values to S \i Causally relevant feature X i : P(X i,Y| do( S \i ) ) P(X i | do( S \i ) )P(Y| do( S \i ) ) for some assignment of values to S \i Weak/strong causal relevance: –Weak=ancestors, indirect causes –Strong=parents, direct causes.
17
Lung cancer Examples
18
Smoking Lung cancer Immediate causes (parents) Genetic factor1
19
Smoking Anxiety Lung cancer Non-immediate causes (other ancestors)
20
Genetic factor1 Other cancers Lung cancer Non causes (e.g. siblings)
21
Y X || Y | C FORK CHAIN X X C C
22
Smoking Anxiety Lung cancer Hidden more direct cause Tar in lungs
23
Smoking Lung cancer Confounder Genetic factor2
24
Coughing Metastasis Lung cancer Biomarker1 Immediate consequences (children)
25
Lung cancer Strongly relevant features (Kohavi-John, 1997) Markov Blanket (Tsamardinos-Aliferis, 2003) X C X C X C X || Y but X || Y | C
26
Lung cancer Bio- marker2 Biomarker1 Non relevant spouse (artifact)
27
Lung cancer Bio- marker2 Biomarker1 Another case of confounder Systematic noise
28
Coughing Allergy Lung cancer Truly relevant spouse
29
Hormonal factor Metastasis Lung cancer Sampling bias
30
Coughing Allergy Smoking Anxiety Genetic factor1 Hormonal factor Metastasis (b) Other cancers Lung cancer Genetic factor2 Tar in lungs Bio- marker2 Biomarker1 Systematic noise Causal feature relevance
31
Conclusion Feature selection focuses on uncovering subsets of variables X 1, X 2, … predictive of the target Y. Multivariate feature selection is in principle more powerful than univariate feature selection, but not always in practice. Taking a closer look at the type of dependencies in terms of causal relationships may help refining the notion of variable relevance.
32
1)Feature Extraction, Foundations and Applications I. Guyon et al, Eds. Springer, 2006. http://clopinet.com/fextract-book 2) Causal feature selection I. Guyon, C. Aliferis, A. Elisseeff In “Computational Methods of Feature Selection”, Huan Liu and Hiroshi Motoda Eds., Chapman and Hall/CRC Press, 2007. http://clopinet.com/isabelle/Papers/causalFS.pdf Acknowledgements and references
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.