Causality challenge #2: Pot-Luck

Slides:



Advertisements
Similar presentations
Alexander Statnikov1, Douglas Hardin1,2, Constantin Aliferis1,3
Advertisements

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Causal Discovery from Medical Textual Data Subramani Mani and Gregory F. Cooper.
Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer.
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
Outline 1)Motivation 2)Representing/Modeling Causal Systems 3)Estimation and Updating 4)Model Search 5)Linear Latent Variable Models 6)Case Study: fMRI.
Dynamic Bayesian Networks (DBNs)
Can causal models be evaluated? Isabelle Guyon ClopiNet / ChaLearn
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
Lecture 5: Causality and Feature Selection Isabelle Guyon
Introduction of Probabilistic Reasoning and Bayesian Networks
The bumpy road of the search for a (good) cause
Causality Workbenchclopinet.com/causality Results of the Causality Challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt.
Causality Workbenchclopinet.com/causality The LOCANET task (Pot-luck challenge, NIPS 2008) Isabelle Guyon, Clopinet Alexander Statnikov, Vanderbilt Univ.
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
Bayesian Network Representation Continued
Simulation and Application on learning gene causal relationships Xin Zhang.
Feature selection methods from correlation to causality Isabelle Guyon NIPS 2008 workshop on kernel learning.
Simulated dataset from Ph.D. work of Alexander Statnikov September 2007.
1 lBayesian Estimation (BE) l Bayesian Parameter Estimation: Gaussian Case l Bayesian Parameter Estimation: General Estimation l Problems of Dimensionality.
1 gR2002 Peter Spirtes Carnegie Mellon University.
Lecture 6: Causal Discovery Isabelle Guyon
Bayesian Estimation (BE) Bayesian Parameter Estimation: Gaussian Case
Causal Models, Learning Algorithms and their Application to Performance Modeling Jan Lemeire Parallel Systems lab November 15 th 2006.
Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.
1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
CAUSAL SEARCH IN THE REAL WORLD. A menu of topics  Some real-world challenges:  Convergence & error bounds  Sample selection bias  Simpson’s paradox.
Bayesian network models of Biological signaling pathways
Bayes Net Perspectives on Causation and Causal Inference
Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks Dirk Husmeier Adriano V. Werhli.
Chapter 2: The Research Enterprise in Psychology
Causality Workbenchclopinet.com/causality Cause-Effect Pair Challenge Isabelle Guyon, ChaLearn IJCNN 2013 IEEE/INNS.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Feature Selection and Causal discovery Isabelle Guyon, Clopinet André Elisseeff, IBM Zürich Constantin Aliferis, Vanderbilt University.
The Research Enterprise in Psychology. The Scientific Method: Terminology Operational definitions are used to clarify precisely what is meant by each.
Reverse engineering gene regulatory networks Dirk Husmeier Adriano Werhli Marco Grzegorczyk.
Random Sets Approach and its Applications Basic iterative feature selection, and modifications. Tests for independence & trimmings (similar to HITON algorithm).
V13: Causality Aims: (1) understand the causal relationships between the variables of a network (2) interpret a Bayesian network as a causal model whose.
Statistics and the Verification Validation & Testing of Adaptive Systems Roman D. Fresnedo M&CT, Phantom Works The Boeing Company.
Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.
Lecture 5: Causality and Feature Selection Isabelle Guyon
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Course files
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.
Question paper 1997.
Dependency Networks for Collaborative Filtering and Data Visualization UAI-2000 발표 : 황규백.
Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André.
1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,
Motivation and Overview
Reverse engineering of regulatory networks Dirk Husmeier & Adriano Werhli.
The Visual Causality Analyst: An Interactive Interface for Causal Reasoning Jun Wang, Stony Brook University Klaus Mueller, Stony Brook University, SUNY.
Single-Subject and Correlational Research Bring Schraw et al.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Network Science K. Borner A.Vespignani S. Wasserman.
1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.
Pattern Classification All materials in these slides* were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.
1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.
1 Day 2: Search June 14, 2016 Carnegie Mellon University Center for Causal Discovery.
Lecture 3: Causality and Feature Selection
Markov Properties of Directed Acyclic Graphs
A Short Tutorial on Causal Network Modeling and Discovery
Center for Causal Discovery: Summer Short Course/Datathon
Filtering and State Estimation: Basic Concepts
Inferential Statistics
A handbook on validation methodology. Metrics.
Presentation transcript:

Causality challenge #2: Pot-Luck Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe Pellet, IBM Zürich Gregory F. Cooper, Pittsburg University Peter Spirtes, Carnegie Mellon

Motivations * Motivations * Learning causal structure * * Cross-sectional studies * … from experiments * … without experiments * Equivalent MB * * Longitudinal studies * Bring your own problem(s) * Motivations

Causality Workbench February 2007: Project starts. Initial funding of the EU Pascal network. August 15, 2007: Two-year grant from the US National Science Foundation. December 15, 2007: Workbench made alive. First causality challenge: causation an prediction. June 3-4, 2008: WCCI 2008, workshop to discuss the results of the first challenge. September 15, 2008: Start pot-luck challenge. Target: NIPS 2008. Fall, 2008: Start developing an interactive workbench.

Why a new challenge? Causality challenge #1 Causality challenge #2 Favor “depth” Single well defined task Rigor of performance assessment Causality challenge #2 Favor “breadth” Many different tasks Encourage creativity

http://clopinet.com/causality 5

Pot-Luck challenge CYTO: Causal Protein-Signaling Networks in human T cells. Learn a protein signaling network from multicolor flow cytometry data. N=11 proteins, P~800 samples per experimental condition. E=9 conditions. LOCANET: LOcal CAusal NETwork. Find the local causal structure around a given target variable (depth 3 network) in REGED, CINA, SIDO, MARTI. PROMO: Simulated marketing task. Time series of 1000 promotion variables and 100 product sales. Predict a 1000x100 boolean influence matrix, indicating for each (i,j) element whether the ith promotion has a causal influence of the sales of the jth product. Data is provided as time series, with a daily value for each variable for three years. SIGNET: Abscisic Acid Signaling Network. Determine the set of 43 boolean rules that describe the interactions of the nodes within a plant signaling network. 300 separate Boolean pseudodynamic simulations of the true rules. Model inspired by a true biological system. TIED: Target Information Equivalent Dataset. Illustrates a case in which there are many equivalent Markov boundaries. Find them all. real self eval real artif artif self eval artif artif

Learning causal structure * Motivations * Learning causal structure * * Cross-sectional studies * … from experiments * … without experiments * Equivalent MB * * Longitudinal studies * Bring your own problem(s) * Learning causal structure

What is causality? Many definitions. Pragmatic (engineering) view: predicting the consequences of ACTIONS. Distinct from making predictions in a stationary environment. Canonical methodology: designed experiments. Causal discovery from observational data.

The “language” of causal Bayesian networks Graph with random variables X1, X2, …Xn as nodes. Dependencies represented by edges. Allow us to compute P(X1, X2, …Xn) as Pi P( Xi | Parents(Xi) ). Edge directions have no meaning. Causal Bayesian network: egde directions indicate causality.

Small example LUCAS0: natural Lung Cancer Smoking Genetics Coughing Attention Disorder Allergy Anxiety Peer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue LUCAS0: natural Markov boundary

Arrows indicate “mechanisms” If Lung Cancer (LC) is determined by Smoking (S) and Genetics (G), In the language of BN, use the data table: P(LC=1| S=1, G=1)=… , P(LC=0| S=1, G=1)=… P(LC=1| S=1, G=0)=… , P(LC=0| S=1, G=0)=… P(LC=1| S=0, G=1)=… , P(LC=0| S=0, G=1)=… P(LC=1| S=0, G=0)=… , P(LC=0| S=0, G=0)=… In the language of Structural Equation Models (SEM), use: LC = f(S, G) + noise where usually f is a linear function.

Common simplifications Assume a Markov process Assume a DAG Assume causal sufficiency (no hidden common cause) Assume stability or faithfulness (no particular parameterization implying dependencies not reflected by the structure) Assume linearity of relationships Assume Gaussianity of PDF’s Discard relationships of low statistical significance Focus on a local neighborhood of a target variable Learn unoriented or partially oriented graphs Assume uniqueness of the Markov boundary

Cross-sectional study How about time? 9 4 11 6 1 10 2 3 7 5 8 Cross-sectional study

Cross-sectional study How about time? 1 1 1 1 9 4 11 6 1 10 2 3 7 5 8 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 8 8 8 8 9 9 9 9 10 10 10 10 11 11 11 11 Cross-sectional study Longitudinal study

* Motivations * Learning causal structure * * Cross-sectional studies * … from experiments * … without experiments * Equivalent MB * * Longitudinal studies * Bring your own problem(s) * Learning causal structure from “cross-sectional” studies: CYTO LOCANET TIED

Causal models as particular “generative models” Imagine we have “prior knowledge” about a few alternative plausible “causal models” (we basically know the architecture). Fit the parameters of the model to data. Select the model based on goodness of fit (score), perhaps penalizing higher complexity models. Could two models have identical scores?

Key types of causal relationships 1 Anxiety Peer Pressure Born an Even Day Yellow Fingers Lung Cancer Smoking Genetics Allergy Attention Disorder Coughing Fatigue Direct cause Car Accident

Key types of causal relationships 2 Smoking Genetics Coughing Attention Disorder Allergy Peer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue Lung Cancer Anxiety Indirect cause (chain) AN  LC | S

Key types of causal relationships 3 Smoking Genetics Coughing Attention Disorder Allergy Anxiety Peer Pressure Car Accident Born an Even Day Fatigue Lung Cancer Yellow Fingers If we do not know about Smoking, we might wrongly conclude that Yellow Fingers might cause cancer by observing some correlation. If we do know about Smoking YF  LC | S does inform us on whether YF it is an indirect cause of LC because the same independence relation holds for chains. Confounder (fork) YF LC | S

How this might look in data Lung cancer Yellow Fingers

How this might look in data Lung cancer Non-smoking Smoking Simpson’s paradox YF  LC | S Yellow Fingers

Markov equivalence X1  Y | X2 X1 Y X2 P(X1, X2 , Y) = P(X1 | X2 , Y) P(Y | X2) P(X2) P(X1, X2 , Y) = P(Y | X2 , X1 ) P(X2 | X1) P(X1) X1 Y X2 P(X1, X2 , Y) = P(X1 | X2 , Y) P(X2 | Y) P(Y) X1 Y X2

Key types of causal relationships 4 Smoking Genetics Coughing Attention Disorder Anxiety Peer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue Lung Cancer Allergy Allergy and Lung cancer are independent, but dependent given Coughing A  LC but not A  LC | C. If we consider only coughing patients, A and LC may appear anti-correlated and we might wrongly conclude that A prevents LC. Collider (V-structure) AL  LC | C

How this might look in data Lung cancer Allergy

How this might look in data Lung cancer Coughing=1 Coughing=0 Allergy

No Markov equivalence Colliders (V-structures) : X1  Y | X2 X1 Y X2 P(X1, X2 , Y) = P(X2 | X1,Y) P(X1) P(Y)

Structural methods 9 4 11 6 1 10 2 3 7 5 8 Build unoriented graph (using conditional independencies). Orient colliders. Add more arrows by constraint propagation without creating new colliders. 9 4 11 6 1 10 2 3 7 5 8 There remain 2 unoriented arrows. The correct direction should be 9->3 otherwise we create a collider. 4-6 remains undetermined.

… towards CYTO: using experiments to learn the causal structure * Motivations * Learning causal structure * * Cross-sectional studies * … from experiments * … without experiments * Equivalent MB * * Longitudinal studies * Bring your own problem(s) * … towards CYTO: using experiments to learn the causal structure

Manipulating a single variable 1 Anxiety Peer Pressure Born an Even Day Yellow Fingers Smoking Genetics Allergy Lung Cancer Attention Disorder Assume that we know already the undirected graph (from observational data) Coughing Fatigue Direct cause Car Accident Smoking manipulated (disconnected from its direct causes): remains predictive of LC.

Anxiety manipulated: remains predictive of Lung Cancer. Manipulating a single variable 2 Anxiety Peer Pressure Born an Even Day Yellow Fingers Smoking Genetics Allergy Lung Cancer Attention Disorder Coughing Fatigue Indirect cause Car Accident Anxiety manipulated: remains predictive of Lung Cancer.

Yellow Fingers manipulated: no longer predictive of LC. Manipulating a single variable 3 Anxiety Peer Pressure Born an Even Day Yellow Fingers Smoking Genetics Allergy Lung Cancer Attention Disorder Consequence of common cause (correlated, but not cause) Coughing Fatigue Car Accident Yellow Fingers manipulated: no longer predictive of LC.

Genetics manipulated: remains predictive of LC and AD. Manipulating a single variable 4 Anxiety Peer Pressure Born an Even Day Yellow Fingers Smoking Genetics ? Allergy Lung Cancer Attention Disorder Coughing Fatigue Direct cause Car Accident Genetics manipulated: remains predictive of LC and AD.

Attention disorder manipulated: no longer predictive of Genetics. Manipulating a single variable 5 Anxiety Peer Pressure Born an Even Day Yellow Fingers Smoking Genetics Allergy Lung Cancer Attention Disorder Coughing Fatigue Direct cause Car Accident Attention disorder manipulated: no longer predictive of Genetics.

The CYTO problem Karen Sachs et al PLCg Erk1/2 Mek1/2 Raf PKC p38 Akt MAPKKK PLCg Erk1/2 Mek1/2 Raf PKC p38 Akt MEK4/7 JNK L A T Lck VAV SLP-76 RAS PKA 1 2 3 CD28 CD3 PI3K LFA-1 Cytohesin Zap70 PIP3 PIP2 JAB-1 Activators 1. a-CD3 2. a-CD28 3. ICAM-2 4. PMA 5. b2cAMP Inhibitors 6. G06976 7. AKT inh 8. Psitect 9. U0126 10. LY294002 10 5 4 6 7 9 8 Causal Protein-Signaling Networks in human T cells. Learn a protein signaling network from multicolor flow cytometry data. N=11 proteins, P~800 samples per experimental condition. E=9 conditions. Critical to our mission is perturbations, both activating the pathways of interest and inhibiting some of the specific molecules measured. Karen Sachs et al 34

* Motivations * Learning causal structure * * Cross-sectional studies * … from experiments * … without experiments * Equivalent MB * * Longitudinal studies * Bring your own problem(s) * … towards LOCANET: learning the causal structure without experiments to predict the consequences of future actions.

What if we cannot experiment? Experiments may be infeasible, costly or unethical Using only observations we may want to predict the effect of new policies. Policies may consist in manipulating several variables.

Manipulating a few variables Lung Cancer Smoking Genetics Coughing Attention Disorder Allergy Anxiety Peer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue LUCAS1: manipulated Markov boundary

Manipulating all variables Anxiety Peer Pressure Born an Even Day Yellow Fingers Smoking Genetics Allergy Lung Cancer Attention Disorder Coughing Fatigue LUCAS2: manipulated Car Accident Markov boundary

Causality challenge #1: causation and prediction Task: Predict the target (e.g., Lung cancer) in “unmanipulated” or “manipulated” test data. Goals: Introduce ML people to causal discovery problems. Investigate ties between causation and prediction. Findings: Participants used either causal or non-causal feature selection. Good causal discovery (feature set containing the “manipulated” MB) correlated with good predictions. However, some participants using non-causal feature selection obtained good prediction results.

Causality challenge #2: The LOCANET problem Task: Find the local causal structure around a given target variable (depth 3 network) in REGED, CINA, SIDO, MARTI. Goal: Analyze more finely to which extent causal discovery methods recover the causal structure and how this affects predicting the target values.

TIED Equivalent Markov boundaries * Motivations * Learning causal structure * * Cross-sectional studies * … from experiments * … without experiments * Equivalent MB * * Longitudinal studies * Bring your own problem(s) * TIED Equivalent Markov boundaries

Equivalent Markov boundaries Y Many almost identical measurements of the same (hidden) variable can lead to many statistically undistinguishable Markov boundaries. Markov boundary

Target Information Equivalence (TIE) Two disjoint subsets of variables V1 and V2 are Target Information Equivalent (TIE) with respect to target Y iff: V1Y V2Y V1Y | V2 V2Y | V1 Alexander Statnikov & Constantin Aliferis

TIE Data (TIED) Exact equivalence Y 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 Small example of the type of relationships implemented in TIED. The following TIE relations hold in the data: TIEY(X1, X2) TIEY(X1, X3) TIEY(X1, X11) TIEY(X2, X3) TIEY(X2, X11) TIEY(X3, X11) TIEX11(X1, X2) TIEX11(X1, X3) TIEX11(X2, X3) Notice that variables X1, X2, X3, X11, and Y are not deterministically related. Alexander Statnikov & Constantin Aliferis

Learning causal structure from “longitudinal” studies: SIGNET PROMO * Motivations * Learning causal structure * * Cross-sectional studies * … from experiments * … without experiments * Equivalent MB * * Longitudinal studies * Bring your own problem(s) * Learning causal structure from “longitudinal” studies: SIGNET PROMO

SIGNET: a plant signaling network Plants loose water and take in carbone dioxide through microscopic pores. During drought, plant hormone abiscisic acid (ABA) inhibits pore opening (important for the genetic engineering of new drought resistant plants). Unraveling the ABA signal transduction network took years of research. A recent dynamic model synthesizes many findings (Li, Assmann, Albert, PLOS, 2006). The model is used by Jenkins and Soni to generate artificial data. The problem is to reconstruct the network from the data.

Abscisic Acid Signaling Network Li, Assmann, Albert, PLOS, 2006 47

Example of asynchronous updates for a 4-node network: SIGNET: sample data 1011101110101101101101001010001011000011001 1100001110111101101101111111011001011101011 1100011110111110101101100011010001110101010 1100001110111110101101100011000011110101010 Boolean model; asynchronous updates 43 nodes 300 simulations time Example of asynchronous updates for a 4-node network:

PROMO: simulated marketing task 100 products 1000 promotions 3 years of daily data Goal: quantify the effect of promotions on sales promotions products Jean-Philippe Pellet

PROMO: schematically… other 1000 100 The difficulties include: non iid samples seasonal effects promotions are binary, sales are continuous the problem is more quantifying the relationships than determining the causal skeleton

Pot-luck challenge: Bring your own problem * Motivations * Learning causal structure * * Cross-sectional studies * … from experiments * … without experiments * Equivalent MB * * Longitudinal studies * Bring your own problem * Pot-luck challenge: Bring your own problem

From NIPS 2006 workshop… 1.        Predict the consequences of a manipulation (similar to a usual predictive modeling task, but the test data is no longer distributed in the same way as the training data; the system undergoes a manipulation to produce the test data). 2.        Determine what manipulations are needed to reach a desired system state with maximum probability (e.g., select variables and propose values to achieve a certain value of a response/target variable, with perhaps a cost per variable). 3.        Propose system queries to acquire more training data, i.e. design experiments, with perhaps an associated cost per variable and per sample and perhaps with constraints on variables, which cannot be controllable. 4.        Determine all causal relationships between variables. 5.        Determine a local causal region around a response/target variable (causal adjacency). 6.        Determine the source cause(s) for a response/target variable. 7.        Determine for all variables whether they are, with respect to a response/target variable: cause, effect, consequence of a common cause, cause of a common effect, or unrelated. 8.        Predict the existence of unmeasured variables (not part of the set of variables provided in the data), which are potential confounders (are common causes of an observed variable and the target). 9.        Predict which variables called “relevant” by feature selection algorithms are potentially causally irrelevant because their correlation to the target is the result of an experimental artifact (e.g., sampling bias or systematic error). 10.     Determine a causal order of all variables. 11.     Determine a causal direction in time series data in which one variable is causing the other. 12.     Determine the direction of time in a time series (mostly of fundamental rather than practical interest). 13.     Incorporate prior knowledge in causal discovery. 14.     Predict counterfactuals.

http://clopinet.com/causality September 15, 2008: challenge start. October 15, 2008: deadline for (optional) submission of milestone challenge results. October 24, 2008: workshop abstracts due. November 12, 2008: challenge ends (last day to submit challenge results). November 21, 2008: JMLR proceedings paper submission deadline. December 12, 2008: challenge results publicly released; workshop.

Prizes Four prizes (free NIPS workshop entrance or $200). Best solution to one or more problems: 3 prizes. Best problem:1 prize. All competitors must submit a 6-page paper. Criteria: performance/usefulness, novelty/originality, sanity, insight, reproducibility, clarity.