Can causal models be evaluated? Isabelle Guyon ClopiNet / ChaLearn
1)Feature Extraction, Foundations and Applications I. Guyon, S. Gunn, et al. Springer, ) Causation and Prediction Challenge I. Guyon, C. Aliferis, G. Cooper, A. Elisseeff, J.-P. Pellet, P. Spirtes, and A. Statnikov, Eds. CiML, volume 2, Microtome Acknowledgements and references
Co-founders: Constantin AliferisAlexander Statnikov André ElisseeffJean-Philippe Pellet Gregory F. CooperPeter Spirtes ChaLearn directors and advisors: Alexander Statnivov Ioannis Tsamardinos Richard Scheines Frederick Eberhardt Florin Popescu
Preparation of ExpDeCo Experimental design in causal discovery Motivations Quiz What we want to do (next challenge) What we already set up (virtual lab) What we could improve Your input… Note: Experiment = manipulation = action
Causal discovery motivations (1) Interesting problems which actions will have beneficial effects? …your health? …climate changes? … the economy? What affects… and…
Predict the consequences of (new) actions Predict the outcome of actions –What if we ate only raw foods? –What if we imposed to paint all cars white? –What if we broke up the Euro? Find the best action to get a desired outcome –Determine treatment (medicine) –Determine policies (economics) Predict counterfactuals –A guy not wearing his seatbelt died in a car accident. Would he have died had he worn it?
Causal discovery motivations (2) Lots of data available
Causal discovery motivations (3) Classical ML helpless X Y Y
X Y Predict the consequences of actions: Under “manipulations” by an external agent, only causes are predictive, consequences and confounders are not. Y Causal discovery motivations (3) Classical ML helpless
X Y If manipulated, a cause influences the outcome… Y Causal discovery motivations (3) Classical ML helpless
X Y … a consequence does not … Y Causal discovery motivations (3) Classical ML helpless
X Y … neither does a confounder (consequence of a common cause). Y Causal discovery motivations (3) Classical ML helpless
Special case: stationary or cross-sectional data (no time series). Superficially, the problem resembles a classical feature selection problem. X n m n’
Quiz
What could be the causal graph?
Could it be that? Y X1X1 X2X2
x2x2 Let’s try x1x1 Y X1X1 X2X2 Simpson’s paradox X 1 || X 2 | Y x1x1 Y
Could it be that? Y X1X1 X2X2
x2x2 x1x1 Let’s try Y X1X1 X2X2 Y
Plausible explanation baseline (X 2 ) health (Y) peak (X 1 ) peak baseline Y normal disease x1x1 x2x2 X 2 || Y X 2 || Y | X 1
x1x1 What we would like Y X1X1 X2X2 Y x2x2
x1x1 Manipulate X 1 Y X1X1 X2X2 Y x2x2
x1x1 Y X1X1 X2X2 Y x2x2 Manipulate X 2
What we want to do
Causal data mining How are we going to do it? Obstacle 1: Practical Many statements of the "causality problem" Obstacle 2: Fundamental It is very hard to assess solutions
Evaluation Experiments are often: –Costly –Unethical –Infeasible Non-experimental “observational” data is abundant and costs less.
New challenge: ExpDeCo Experimental design in causal discovery -Goal: Find variables that strongly influence an outcome -Method: -Learn from a “natural” distribution (observational data) -Predict the consequences of given actions (checked against a test set of “real” experimental data) -Iteratively refine the model with experiments (using on-line learning from experimental data)
What we have already done
QUERIES ANSWERS Database Lung Cancer SmokingGenetics Coughing Attention Disorder Allergy AnxietyPeer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue Models of systems
February 2007: Project starts. Pascal2 funding. August 2007: Two-year NSF grant. Dec. 2007: Workbench alive. 1 st causality challenge. Sept. 2008: 2 nd causality challenge (Pot luck). Fall 2009: Virtual lab alive. Dec. 2009: Active Learning Challenge (Pascal2). December 2010: Unsupervised and Transfer Learning Challenge (DARPA). Fall 2012: ExpDeCo (Pascal2) Planned: CoMSiCo
What remains to be done
ExpDeCo (new challenge) Setup: Several paired datasets (preferably or real data): –“Natural” distribution –“Manipulated” distribution Problems –Learn a causal model from the natural distribution –Assessment 1: test with natural distribution –Assessment 2: test with manipulated distribution –Assessment 3: on-line learning from manipulated distribution (sequential design of experiments)
Challenge design constraints -Largely not relying on “ground truth” this is difficult or impossible to get (in real data) -Not biased towards particular methods -Realistic setting as close as possible to actual use -Statistically significant, not involving "chance“ -Reproducible on other similar data -Not specific of very particular settings -No cheating possible -Capitalize on classical experimental design
Lessons learned from the Causation & Prediction Challenge
Causation and Prediction challenge Toy datasets Challenge datasets
Assessment w. manipulations (artificial data)
Lung Cancer SmokingGenetics Coughing Attention Disorder Allergy AnxietyPeer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue LUCAS 0 : natural Causality assessment with manipulations
LUCAS 1 : manipulated Lung Cancer Smoking Genetics Coughing Attention Disorder Allergy AnxietyPeer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue Causality assessment with manipulations
Lung Cancer SmokingGenetics Coughing Attention Disorder Allergy AnxietyPeer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue LUCAS 2 : manipulated Causality assessment with manipulations
Participants score feature relevance: S=ordered list of features We assess causal relevance with AUC=f(V,S) Assessment w. ground truth We define: V=variables of interest (Theoretical minimal set of predictive variables, e.g. MB, direct causes,...)
Assessment without manip. (real data)
Using artificial “probes” Lung Cancer SmokingGenetics Coughing Attention Disorder Allergy AnxietyPeer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue LUCAP 0 : natural Probes P1P1 P2P2 P3P3 PTPT
Lung Cancer SmokingGenetics Coughing Attention Disorder Allergy AnxietyPeer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue P1P1 P2P2 P3P3 PTPT LUCAP 1&2 : manipulated Using artificial “probes”
Scoring using “probes” What we can compute (Fscore): –Negative class = probes (here, all “non-causes”, all manipulated). –Positive class = other variables (may include causes and non causes). What we want (Rscore): –Positive class = causes. –Negative class = non-causes. What we get (asymptotically): Fscore = (N TruePos /N Real ) Rscore (N TrueNeg /N Real )
Pairwise comparisons
Causal vs. non-causal Jianxin Yin: causal Vladimir Nikulin: non-causal
Insensitivity to irrelevant features Simple univariate predictive model, binary target and features, all relevant features correlate perfectly with the target, all irrelevant features randomly drawn. With 98% confidence, abs(feat_weight) < w and i w i x i < v. n g number of “ good ” (relevant) features n b number of “ bad ” (irrelevant) features m number of training examples.
How to overcome this problem? Leaning curve in terms of number of features revealed –Without re-training on manipulated data –With on-line learning with manipulated data Give pre-manipulation variable values and the value of the manipulation Other metrics: stability, residuals, instrument variables, missing features by design
Conclusion (more: We want causal discovery to become “mainstream” data mining We believe we need to start with “simple” standard procedures of evaluation Our design is close enough to a typical prediction problem, but –Training on natural distribution –Test on manipulated distribution We want to avoid pitfalls of previous challenge designs: –Reveal only pre-manipulated variable values –Reveal variables progressively “on demand”