Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André.

Slides:



Advertisements
Similar presentations
Alexander Statnikov1, Douglas Hardin1,2, Constantin Aliferis1,3
Advertisements

Topic Outline Motivation Representing/Modeling Causal Systems
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
Weakening the Causal Faithfulness Assumption
Can causal models be evaluated? Isabelle Guyon ClopiNet / ChaLearn
Lecture 5: Causality and Feature Selection Isabelle Guyon
Structural Inference of Hierarchies in Networks BY Yu Shuzhi 27, Mar 2014.
Using Markov Blankets for Causal Structure Learning Jean-Philippe Pellet Andre Ellisseeff Presented by Na Dai.
The bumpy road of the search for a (good) cause
Causality Workbenchclopinet.com/causality Results of the Causality Challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt.
Causality Workbenchclopinet.com/causality The LOCANET task (Pot-luck challenge, NIPS 2008) Isabelle Guyon, Clopinet Alexander Statnikov, Vanderbilt Univ.
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
Co-Training and Expansion: Towards Bridging Theory and Practice Maria-Florina Balcan, Avrim Blum, Ke Yang Carnegie Mellon University, Computer Science.
Causality challenge workshop (IEEE WCCI) June 2, Slide 1 Bernoulli Mixture Models for Markov Blanket Filtering and Classification Mehreen Saeed Department.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Feature selection methods from correlation to causality Isabelle Guyon NIPS 2008 workshop on kernel learning.
Ambiguous Manipulations
Lecture 6: Causal Discovery Isabelle Guyon
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Causal Models, Learning Algorithms and their Application to Performance Modeling Jan Lemeire Parallel Systems lab November 15 th 2006.
Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.
Feature selection and causal discovery fundamentals and applications Isabelle Guyon
Ensemble Learning (2), Tree and Forest
1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.
Bayes Net Perspectives on Causation and Causal Inference
Causality Workbenchclopinet.com/causality Cause-Effect Pair Challenge Isabelle Guyon, ChaLearn IJCNN 2013 IEEE/INNS.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Feature Selection and Causal discovery Isabelle Guyon, Clopinet André Elisseeff, IBM Zürich Constantin Aliferis, Vanderbilt University.
315 Feature Selection. 316 Goals –What is Feature Selection for classification? –Why feature selection is important? –What is the filter and what is the.
Bayesian Networks Martin Bachler MLA - VO
Causality challenge #2: Pot-Luck
Random Sets Approach and its Applications Basic iterative feature selection, and modifications. Tests for independence & trimmings (similar to HITON algorithm).
Introduction Osborn. Daubert is a benchmark!!!: Daubert (1993)- Judges are the “gatekeepers” of scientific evidence. Must determine if the science is.
V13: Causality Aims: (1) understand the causal relationships between the variables of a network (2) interpret a Bayesian network as a causal model whose.
On Learning Parsimonious Models for Extracting Consumer Opinions International Conference on System Sciences 2005 Xue Bai and Rema Padman The John Heinz.
Lecture 5: Causality and Feature Selection Isabelle Guyon
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Course files
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
273 Discovery of Causal Structure Using Causal Probabilistic Network Induction AMIA 2003, Machine Learning Tutorial Constantin F. Aliferis & Ioannis Tsamardinos.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.
Competitions in machine learning: the fun, the art, and the science Isabelle Guyon Clopinet, Berkeley, California
Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.
Slides for “Data Mining” by I. H. Witten and E. Frank.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Ensemble Methods in Machine Learning
276 Causal Discovery Methods Using Causal Probabilistic Networks MEDINFO 2004, T02: Machine Learning Methods for Decision Support and Discovery Constantin.
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
The Visual Causality Analyst: An Interactive Interface for Causal Reasoning Jun Wang, Stony Brook University Klaus Mueller, Stony Brook University, SUNY.
Judea Pearl Computer Science Department UCLA ROBUSTNESS OF CAUSAL CLAIMS.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
To Personalize or Not to Personalize: Modeling Queries with Variation in User Intent Presented by Jaime Teevan, Susan T. Dumais, Daniel J. Liebling Microsoft.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Introduction Many problems in Engineering, Management, Health Sciences and other Sciences involve exploring the relationships between two or more variables.
Introduction to translational and clinical bioinformatics Connecting complex molecular information to clinically relevant decisions using molecular.
COMP61011 Foundations of Machine Learning Feature Selection
Lecture 3: Causality and Feature Selection
Markov Properties of Directed Acyclic Graphs
Learning Markov Blankets
Feature Selection Ioannis Tsamardinos Machine Learning Course, 2006
Causal Data Mining Richard Scheines
An Algorithm for Bayesian Network Construction from Data
Searching for Graphical Causal Models of Education Data
Presentation transcript:

Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe Pellet, IBM Zürich Gregory F. Cooper, Pittsburg University Peter Spirtes, Carnegie Mellon

Causal discovery Which actions will have beneficial effects? …your health? …climate changes? … the economy? What affects…

What is causality? Many definitions: –Science –Philosophy –Law –Psychology –History –Religion –Engineering “Cause is the effect concealed, effect is the cause revealed” (Hindu philosophy)

The system Systemic causality External agent

Difficulty A lot of “observational” data. Correlation  Causality! Experiments are often needed, but: –Costly –Unethical –Infeasible

Causality workbench

Our approach What is the causal question? Why should we care? What is hard about it? Is this solvable? Is this a good benchmark?

Four tasks Toy datasets Challenge datasets

On-line feed-back

Toy Examples

Lung Cancer SmokingGenetics Coughing Attention Disorder Allergy AnxietyPeer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue LUCAS 0 : natural Causality assessment with manipulations

LUCAS 1 : manipulated Lung Cancer Smoking Genetics Coughing Attention Disorder Allergy AnxietyPeer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue Causality assessment with manipulations

Lung Cancer SmokingGenetics Coughing Attention Disorder Allergy AnxietyPeer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue LUCAS 2 : manipulated Causality assessment with manipulations

Goal driven causality We define: V=variables of interest (e.g. MB, direct causes,...) We assess causal relevance: Fscore=f(V,S) Participants return: S=selected subset (ordered or not).

Causality assessment without manipulation?

Using artificial “probes” Lung Cancer SmokingGenetics Coughing Attention Disorder Allergy AnxietyPeer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue LUCAP 0 : natural Probes P1P1 P2P2 P3P3 PTPT

Lung Cancer SmokingGenetics Coughing Attention Disorder Allergy AnxietyPeer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue LUCAP 0 : natural Probes P1P1 P2P2 P3P3 PTPT Using artificial “probes”

Probes Lung Cancer SmokingGenetics Coughing Attention Disorder Allergy AnxietyPeer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue P1P1 P2P2 P3P3 PTPT LUCAP 1&2 : manipulated Using artificial “probes”

Scoring using “probes” What we can compute (Fscore): –Negative class = probes (here, all “non-causes”, all manipulated). –Positive class = other variables (may include causes and non causes). What we want (Rscore): –Positive class = causes. –Negative class = non-causes. What we get (asymptotically): Fscore = (N TruePos /N Real ) Rscore (N TrueNeg /N Real )

Results

AUC distribution

Methods employed Causal: Methods employing causal discovery technique to unravel cause-effect relationships in the neighborhood of the target. Markov blanket: Methods for extracting the Markov blanket, without attempting to unravel cause-effect relationships. Feature selection: Methods for selecting predictive features making no explicit attempt to uncover the Markov blanket or perform causal discovery.

Formalism: Causal Bayesian networks Bayesian network: –Graph with random variables X 1, X 2, …X n as nodes. –Dependencies represented by edges. –Allow us to compute P(X 1, X 2, …X n ) as  i P( X i | Parents(X i ) ). –Edge directions have no meaning. Causal Bayesian network: egde directions indicate causality.

Causal discovery from “observational data” Example algorithm: PC (Peter Spirtes and Clarck Glymour, 1999) Let A, B, C  X and V  X. Initialize with a fully connected un-oriented graph. 1. Conditional independence. Cut connection if  V s.t. (A  B | V). 2. Colliders. In triplets A — C — B (A — B) if there is no subset V containing C s.t. A  B | V, orient edges as: A  C  B. 3. Constraint-propagation. Orient edges until no change: (i) If A  B  …  C, and A — C then A  C. (ii) If A  B — C then B  C.

Computational and statistical complexity Computing the full causal graph poses: Computational challenges (intractable for large numbers of variables) Statistical challenges (difficulty of estimation of conditional probabilities for many var. w. few samples). Compromise: Develop algorithms with good average- case performance, tractable for many real-life datasets. Abandon learning the full causal graph and instead develop methods that learn a local neighborhood. Abandon learning the fully oriented causal graph and instead develop methods that learn unoriented graphs.

Target Y A prototypical MB algo: HITON Aliferis-Tsamardinos-Statnikov, 2003

Target Y 1 – Identify variables with direct edges to the target (parent/children) Aliferis-Tsamardinos-Statnikov, 2003

Target Y Aliferis-Tsamardinos-Statnikov, – Identify variables with direct edges to the target (parent/children) A B Iteration 1: add A Iteration 2: add B Iteration 3: remove A because A  Y | B etc. A A B B

Target Y Aliferis-Tsamardinos-Statnikov, – Repeat algorithm for parents and children of Y (get depth two relatives)

Target Y Aliferis-Tsamardinos-Statnikov, – Remove non-members of the MB A member A of PCPC that is not in PC is a member of the Markov Blanket if there is some member of PC B, such that A becomes conditionally dependent with Y conditioned on any subset of the remaining variables and B. A B

Collider Spouse Target Y Spouse Collider Aliferis-Tsamardinos-Statnikov, – Orient edges 1. Colliders: The presence of a spouse determines a collider. The target may also be a collider (B  C | Y). B C

Collider Spouse Target Y Spouse Collider Aliferis-Tsamardinos-Statnikov, – Orient edges 1. Colliders: The presence of a spouse determines a collider. The target may also be a collider (B  C | Y). 2.Orient remaining edges. B C

Additional Bells and Whistles The basic algorithms make simplifying assumptions: – Faithfulness (any conditional independence between two variables results in an absence of direct edge.) – Causal sufficiency (there are no unobserved common causes of the observed variables.) Laura E. Brown &Ioannis Tsamardinos: – Violations of “ faithfulness ” : select product of features. – Violation of “ causal sufficiency ” : use Y structures.

Discussion

Top ranking methods According to the rules of the challenge: –Yin Wen Chang: SVM => best prediction accuracy on REGED and CINA. –Gavin Cawley: Causal explorer + linear ridge regression ensembles => best prediction accuracy on SIDO and MARTI. According to pairwise comparisons: –Jianxin Yin and Prof. Zhi Geng’s group: Partial Orientation and Local Structural Learning => best on Pareto front, new original causal discovery algorithm.

Pairwise comparisons

Causal vs. non-causal Jianxin Yin: causal Vladimir Nikulin: non-causal

Using manip-MB as feature set  using a causal model Unmanipulated (training) Manipulation #1 (test) Heuristic: (1) Use the post-manipulation MB as feature set; (2) train a classifier to predict Y on training data (from the unmanipulated distribution). Manipulation #2 (test) Problem: Manipulated children of the target may remain in the post-manipulation MB (if they are also spouses) but with a different dependency to the target.

MB is not the best feature set? Some features outside the MB may enhance predictivity if: a. Some MB features go undetected (e.g. the direct causes are children of a common ancestor). b. The predictor is too “weak” (e.g. the relationship to the target is non-linear but the predictor is linear). Y X Z y=a x 2 z= x 2 (a) (b)

Insensitivity to irrelevant features Simple univariate predictive model, binary target and features, all relevant features correlate perfectly with the target, all irrelevant features randomly drawn. With 98% confidence, abs(feat_weight) < w and  i w i x i < v. n g number of “ good ” (relevant) features n b number of “ bad ” (irrelevant) features m number of training examples.

Conclusion Causal discovery from observational data is not an impossible task, but a very hard one. This points to the need for further research and benchmark. Don’t miss the “pot-luck challenge”

1) Causal Feature Selection I. Guyon, C. Aliferis, A. Elisseeff In “Computational Methods of Feature Selection”, Huan Liu and Hiroshi Motoda Eds., Chapman and Hall/CRC Press, ) Design and Analysis of the Causation and Prediction Challenge I. Guyon, C. Aliferis, G. Cooper, A. Elisseeff, J.-P. Pellet, P. Spirtes, A. Statnikov, JMLR workshop proceedings, in press.