Lecture 5: Causality and Feature Selection Isabelle Guyon

Slides:



Advertisements
Similar presentations
Alexander Statnikov1, Douglas Hardin1,2, Constantin Aliferis1,3
Advertisements

Markov Networks Alan Ritter.
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
Weakening the Causal Faithfulness Assumption
Structure Learning Using Causation Rules Raanan Yehezkel PAML Lab. Journal Club March 13, 2003.
Exact Inference in Bayes Nets
Dynamic Bayesian Networks (DBNs)
Identifying Conditional Independencies in Bayes Nets Lecture 4.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Lecture 4: Embedded methods
Structural Inference of Hierarchies in Networks BY Yu Shuzhi 27, Mar 2014.
Belief Propagation on Markov Random Fields Aggeliki Tsoli.
Using Markov Blankets for Causal Structure Learning Jean-Philippe Pellet Andre Ellisseeff Presented by Na Dai.
Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.
Causality Workbenchclopinet.com/causality Results of the Causality Challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
Bayes Nets Rong Jin. Hidden Markov Model  Inferring from observations (o i ) to hidden variables (q i )  This is a general framework for representing.
Feature selection methods from correlation to causality Isabelle Guyon NIPS 2008 workshop on kernel learning.
Lecture 6: Causal Discovery Isabelle Guyon
Bayesian Networks Alan Ritter.
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Feature selection methods Isabelle Guyon IPAM summer school on Mathematics in Brain Imaging. July 2008.
Causal Models, Learning Algorithms and their Application to Performance Modeling Jan Lemeire Parallel Systems lab November 15 th 2006.
Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.
Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.
Feature selection and causal discovery fundamentals and applications Isabelle Guyon
1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.
CPS 270: Artificial Intelligence Bayesian networks Instructor: Vincent Conitzer.
Bayes Net Perspectives on Causation and Causal Inference
Image Analysis and Markov Random Fields (MRFs) Quanren Xiong.
Feature Selection and Causal discovery Isabelle Guyon, Clopinet André Elisseeff, IBM Zürich Constantin Aliferis, Vanderbilt University.
315 Feature Selection. 316 Goals –What is Feature Selection for classification? –Why feature selection is important? –What is the filter and what is the.
Bayesian Networks Martin Bachler MLA - VO
Causality challenge #2: Pot-Luck
V13: Causality Aims: (1) understand the causal relationships between the variables of a network (2) interpret a Bayesian network as a causal model whose.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Lecture 5: Causality and Feature Selection Isabelle Guyon
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Course files
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
273 Discovery of Causal Structure Using Causal Probabilistic Network Induction AMIA 2003, Machine Learning Tutorial Constantin F. Aliferis & Ioannis Tsamardinos.
NIPS 2001 Workshop on Feature/Variable Selection Isabelle Guyon BIOwulf Technologies.
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
Computing & Information Sciences Kansas State University Data Sciences Summer Institute Multimodal Information Access and Synthesis Learning and Reasoning.
INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.
Slides for “Data Mining” by I. H. Witten and E. Frank.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Lecture 2: Statistical learning primer for biologists
Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André.
1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
276 Causal Discovery Methods Using Causal Probabilistic Networks MEDINFO 2004, T02: Machine Learning Methods for Decision Support and Discovery Constantin.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
An Algorithm to Learn the Structure of a Bayesian Network Çiğdem Gündüz Olcay Taner Yıldız Ethem Alpaydın Computer Engineering Taner Bilgiç Industrial.
Lecture 3: Causality and Feature Selection
Markov Properties of Directed Acyclic Graphs
Learning Markov Blankets
Feature Selection Ioannis Tsamardinos Machine Learning Course, 2006
Center for Causal Discovery: Summer Short Course/Datathon
Markov Networks.
CPS 570: Artificial Intelligence Bayesian networks
Instructor: Vincent Conitzer
Presentation transcript:

Lecture 5: Causality and Feature Selection Isabelle Guyon

Variable/feature selection Remove features X i to improve (or least degrade) prediction of Y. X Y

What can go wrong? Guyon-Aliferis-Elisseeff, 2007

What can go wrong?

Guyon-Aliferis-Elisseeff, 2007 X2X2 Y X1X1 Y X1X1 X2X2

X Y Causal feature selection Uncover causal relationships between X i and Y. Y

Lung cancer Causal feature relevance

Lung cancer Causal feature relevance

Lung cancer Causal feature relevance

Lung cancer Markov Blanket Strongly relevant features (Kohavi-John, 1997)  Markov Blanket (Tsamardinos-Aliferis, 2003)

Feature relevance Surely irrelevant feature X i : P(X i, Y |S \i ) = P(X i |S \i )P(Y |S \i ) for all S \i  X \i and all assignment of values to S \i Strongly relevant feature X i : P(X i, Y |X \i )  P(X i |X \i )P(Y |X \i ) for some assignment of values to X \i Weakly relevant feature X i : P(X i, Y |S \i )  P(X i |S \i )P(Y |S \i ) for some assignment of values to S \i  X \i

Lung cancer Markov Blanket Strongly relevant features (Kohavi-John, 1997)  Markov Blanket (Tsamardinos-Aliferis, 2003)

Lung cancer Strongly relevant features (Kohavi-John, 1997)  Markov Blanket (Tsamardinos-Aliferis, 2003) PARENTS Markov Blanket

Lung cancer Strongly relevant features (Kohavi-John, 1997)  Markov Blanket (Tsamardinos-Aliferis, 2003) CHILDREN Markov Blanket

Lung cancer Strongly relevant features (Kohavi-John, 1997)  Markov Blanket (Tsamardinos-Aliferis, 2003) SPOUSES Markov Blanket

Causal relevance Surely irrelevant feature X i : P(X i, Y |S \i ) = P(X i |S \i )P(Y |S \i ) for all S \i  X \i and all assignment of values to S \i Causally relevant feature X i : P(X i,Y| do( S \i ) )  P(X i | do( S \i ) )P(Y| do( S \i ) ) for some assignment of values to S \i Weak/strong causal relevance: –Weak=ancestors, indirect causes –Strong=parents, direct causes.

Lung cancer Examples

Smoking Lung cancer Immediate causes (parents) Genetic factor1

Smoking Lung cancer Immediate causes (parents)

Smoking Anxiety Lung cancer Non-immediate causes (other ancestors)

Genetic factor1 Other cancers Lung cancer Non causes (e.g. siblings)

Y X || Y | C FORK CHAIN X X C C

Smoking Anxiety Lung cancer Hidden more direct cause Tar in lungs

Smoking Lung cancer Confounder Genetic factor2

Coughing Metastasis Lung cancer Biomarker1 Immediate consequences (children)

Lung cancer Strongly relevant features (Kohavi-John, 1997)  Markov Blanket (Tsamardinos-Aliferis, 2003) X C X C X C X || Y but X || Y | C

Lung cancer Bio- marker2 Biomarker1 Non relevant spouse (artifact)

Lung cancer Bio- marker2 Biomarker1 Another case of confounder Systematic noise

Coughing Allergy Lung cancer Truly relevant spouse

Hormonal factor Metastasis Lung cancer Sampling bias

Coughing Allergy Smoking Anxiety Genetic factor1 Hormonal factor Metastasis (b) Other cancers Lung cancer Genetic factor2 Tar in lungs Bio- marker2 Biomarker1 Systematic noise Causal feature relevance

Formalism: Causal Bayesian networks Bayesian network: –Graph with random variables X 1, X 2, …X n as nodes. –Dependencies represented by edges. –Allow us to compute P(X 1, X 2, …X n ) as  i P( X i | Parents(X i ) ). –Edge directions have no meaning. Causal Bayesian network: egde directions indicate causality.

Example of Causal Discovery Algorithm Algorithm: PC (Peter Spirtes and Clarck Glymour, 1999) Let A, B, C  X and V  X. Initialize with a fully connected un-oriented graph. 1. Find un-oriented edges by using the criterion that variable A shares a direct edge with variable B iff no subset of other variables V can render them conditionally independent (A  B | V). 2. Orient edges in “ collider ” triplets (i.e., of the type: A  C  B) using the criterion that if there are direct edges between A, C and between C and B, but not between A and B, then A  C  B, iff there is no subset V containing C such that A  B | V. 3. Further orient edges with a constraint-propagation method by adding orientations until no further orientation can be produced, using the two following criteria: (i) If A  B  …  C, and A — C (i.e. there is an undirected edge between A and C) then A  C. (ii) If A  B — C then B  C.

Computational and statistical complexity Computing the full causal graph poses: Computational challenges (intractable for large numbers of variables) Statistical challenges (difficulty of estimation of conditional probabilities for many var. w. few samples). Compromise: Develop algorithms with good average- case performance, tractable for many real-life datasets. Abandon learning the full causal graph and instead develop methods that learn a local neighborhood. Abandon learning the fully oriented causal graph and instead develop methods that learn unoriented graphs.

Target Y A prototypical MB algo: HITON Aliferis-Tsamardinos-Statnikov, 2003)

Target Y 1 – Identify variables with direct edges to the target (parent/children) Aliferis-Tsamardinos-Statnikov, 2003)

Target Y Aliferis-Tsamardinos-Statnikov, 2003) 1 – Identify variables with direct edges to the target (parent/children) A B Iteration 1: add A Iteration 2: add B Iteration 3: remove B because A  Y | B etc. A A B B

Target Y Aliferis-Tsamardinos-Statnikov, 2003) 2 – Repeat algorithm for parents and children of Y (get depth two relatives)

Target Y Aliferis-Tsamardinos-Statnikov, 2003) 3 – Remove non-members of the MB A member A of PCPC that is not in PC is a member of the Markov Blanket if there is some member of PC B, such that A becomes conditionally dependent with Y conditioned on any subset of the remaining variables and B. A B

Conclusion Feature selection focuses on uncovering subsets of variables X 1, X 2, … predictive of the target Y. Multivariate feature selection is in principle more powerful than univariate feature selection, but not always in practice. Taking a closer look at the type of dependencies in terms of causal relationships may help refining the notion of variable relevance.

1)Feature Extraction, Foundations and Applications I. Guyon et al, Eds. Springer, ) Causal feature selection I. Guyon, C. Aliferis, A. Elisseeff To appear in “Computational Methods of Feature Selection”, Huan Liu and Hiroshi Motoda Eds., Chapman and Hall/CRC Press, Acknowledgements and references