Presentation is loading. Please wait.

Presentation is loading. Please wait.

Causality challenge #2: Pot-Luck

Similar presentations


Presentation on theme: "Causality challenge #2: Pot-Luck"— Presentation transcript:

1 Causality challenge #2: Pot-Luck
Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe Pellet, IBM Zürich Gregory F. Cooper, Pittsburg University Peter Spirtes, Carnegie Mellon

2 Motivations * Motivations * Learning causal structure *
* Cross-sectional studies * … from experiments * … without experiments * Equivalent MB * * Longitudinal studies * Bring your own problem(s) * Motivations

3 Causality Workbench February 2007: Project starts. Initial funding of the EU Pascal network. August 15, 2007: Two-year grant from the US National Science Foundation. December 15, 2007: Workbench made alive. First causality challenge: causation an prediction. June 3-4, 2008: WCCI 2008, workshop to discuss the results of the first challenge. September 15, 2008: Start pot-luck challenge. Target: NIPS 2008. Fall, 2008: Start developing an interactive workbench.

4 Why a new challenge? Causality challenge #1 Causality challenge #2
Favor “depth” Single well defined task Rigor of performance assessment Causality challenge #2 Favor “breadth” Many different tasks Encourage creativity

5 5

6 Pot-Luck challenge CYTO: Causal Protein-Signaling Networks in human T cells. Learn a protein signaling network from multicolor flow cytometry data. N=11 proteins, P~800 samples per experimental condition. E=9 conditions. LOCANET: LOcal CAusal NETwork. Find the local causal structure around a given target variable (depth 3 network) in REGED, CINA, SIDO, MARTI. PROMO: Simulated marketing task. Time series of 1000 promotion variables and 100 product sales. Predict a 1000x100 boolean influence matrix, indicating for each (i,j) element whether the ith promotion has a causal influence of the sales of the jth product. Data is provided as time series, with a daily value for each variable for three years. SIGNET: Abscisic Acid Signaling Network. Determine the set of 43 boolean rules that describe the interactions of the nodes within a plant signaling network. 300 separate Boolean pseudodynamic simulations of the true rules. Model inspired by a true biological system. TIED: Target Information Equivalent Dataset. Illustrates a case in which there are many equivalent Markov boundaries. Find them all. real self eval real artif artif self eval artif artif

7 Learning causal structure
* Motivations * Learning causal structure * * Cross-sectional studies * … from experiments * … without experiments * Equivalent MB * * Longitudinal studies * Bring your own problem(s) * Learning causal structure

8 What is causality? Many definitions.
Pragmatic (engineering) view: predicting the consequences of ACTIONS. Distinct from making predictions in a stationary environment. Canonical methodology: designed experiments. Causal discovery from observational data.

9 The “language” of causal Bayesian networks
Graph with random variables X1, X2, …Xn as nodes. Dependencies represented by edges. Allow us to compute P(X1, X2, …Xn) as Pi P( Xi | Parents(Xi) ). Edge directions have no meaning. Causal Bayesian network: egde directions indicate causality.

10 Small example LUCAS0: natural Lung Cancer Smoking Genetics Coughing
Attention Disorder Allergy Anxiety Peer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue LUCAS0: natural Markov boundary

11 Arrows indicate “mechanisms”
If Lung Cancer (LC) is determined by Smoking (S) and Genetics (G), In the language of BN, use the data table: P(LC=1| S=1, G=1)=… , P(LC=0| S=1, G=1)=… P(LC=1| S=1, G=0)=… , P(LC=0| S=1, G=0)=… P(LC=1| S=0, G=1)=… , P(LC=0| S=0, G=1)=… P(LC=1| S=0, G=0)=… , P(LC=0| S=0, G=0)=… In the language of Structural Equation Models (SEM), use: LC = f(S, G) + noise where usually f is a linear function.

12 Common simplifications
Assume a Markov process Assume a DAG Assume causal sufficiency (no hidden common cause) Assume stability or faithfulness (no particular parameterization implying dependencies not reflected by the structure) Assume linearity of relationships Assume Gaussianity of PDF’s Discard relationships of low statistical significance Focus on a local neighborhood of a target variable Learn unoriented or partially oriented graphs Assume uniqueness of the Markov boundary

13 Cross-sectional study
How about time? 9 4 11 6 1 10 2 3 7 5 8 Cross-sectional study

14 Cross-sectional study
How about time? 1 1 1 1 9 4 11 6 1 10 2 3 7 5 8 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 8 8 8 8 9 9 9 9 10 10 10 10 11 11 11 11 Cross-sectional study Longitudinal study

15 * Motivations * Learning causal structure *
* Cross-sectional studies * … from experiments * … without experiments * Equivalent MB * * Longitudinal studies * Bring your own problem(s) * Learning causal structure from “cross-sectional” studies: CYTO LOCANET TIED

16 Causal models as particular “generative models”
Imagine we have “prior knowledge” about a few alternative plausible “causal models” (we basically know the architecture). Fit the parameters of the model to data. Select the model based on goodness of fit (score), perhaps penalizing higher complexity models. Could two models have identical scores?

17 Key types of causal relationships 1
Anxiety Peer Pressure Born an Even Day Yellow Fingers Lung Cancer Smoking Genetics Allergy Attention Disorder Coughing Fatigue Direct cause Car Accident

18 Key types of causal relationships 2
Smoking Genetics Coughing Attention Disorder Allergy Peer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue Lung Cancer Anxiety Indirect cause (chain) AN  LC | S

19 Key types of causal relationships 3
Smoking Genetics Coughing Attention Disorder Allergy Anxiety Peer Pressure Car Accident Born an Even Day Fatigue Lung Cancer Yellow Fingers If we do not know about Smoking, we might wrongly conclude that Yellow Fingers might cause cancer by observing some correlation. If we do know about Smoking YF  LC | S does inform us on whether YF it is an indirect cause of LC because the same independence relation holds for chains. Confounder (fork) YF LC | S

20 How this might look in data
Lung cancer Yellow Fingers

21 How this might look in data
Lung cancer Non-smoking Smoking Simpson’s paradox YF  LC | S Yellow Fingers

22 Markov equivalence X1  Y | X2 X1 Y X2 P(X1, X2 , Y)
= P(X1 | X2 , Y) P(Y | X2) P(X2) P(X1, X2 , Y) = P(Y | X2 , X1 ) P(X2 | X1) P(X1) X1 Y X2 P(X1, X2 , Y) = P(X1 | X2 , Y) P(X2 | Y) P(Y) X1 Y X2

23 Key types of causal relationships 4
Smoking Genetics Coughing Attention Disorder Anxiety Peer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue Lung Cancer Allergy Allergy and Lung cancer are independent, but dependent given Coughing A  LC but not A  LC | C. If we consider only coughing patients, A and LC may appear anti-correlated and we might wrongly conclude that A prevents LC. Collider (V-structure) AL  LC | C

24 How this might look in data
Lung cancer Allergy

25 How this might look in data
Lung cancer Coughing=1 Coughing=0 Allergy

26 No Markov equivalence Colliders (V-structures) : X1  Y | X2 X1 Y X2
P(X1, X2 , Y) = P(X2 | X1,Y) P(X1) P(Y)

27 Structural methods 9 4 11 6 1 10 2 3 7 5 8 Build unoriented graph (using conditional independencies). Orient colliders. Add more arrows by constraint propagation without creating new colliders. 9 4 11 6 1 10 2 3 7 5 8 There remain 2 unoriented arrows. The correct direction should be 9->3 otherwise we create a collider. 4-6 remains undetermined.

28 … towards CYTO: using experiments to learn the causal structure
* Motivations * Learning causal structure * * Cross-sectional studies * … from experiments * … without experiments * Equivalent MB * * Longitudinal studies * Bring your own problem(s) * … towards CYTO: using experiments to learn the causal structure

29 Manipulating a single variable
1 Anxiety Peer Pressure Born an Even Day Yellow Fingers Smoking Genetics Allergy Lung Cancer Attention Disorder Assume that we know already the undirected graph (from observational data) Coughing Fatigue Direct cause Car Accident Smoking manipulated (disconnected from its direct causes): remains predictive of LC.

30 Anxiety manipulated: remains predictive of Lung Cancer.
Manipulating a single variable 2 Anxiety Peer Pressure Born an Even Day Yellow Fingers Smoking Genetics Allergy Lung Cancer Attention Disorder Coughing Fatigue Indirect cause Car Accident Anxiety manipulated: remains predictive of Lung Cancer.

31 Yellow Fingers manipulated: no longer predictive of LC.
Manipulating a single variable 3 Anxiety Peer Pressure Born an Even Day Yellow Fingers Smoking Genetics Allergy Lung Cancer Attention Disorder Consequence of common cause (correlated, but not cause) Coughing Fatigue Car Accident Yellow Fingers manipulated: no longer predictive of LC.

32 Genetics manipulated: remains predictive of LC and AD.
Manipulating a single variable 4 Anxiety Peer Pressure Born an Even Day Yellow Fingers Smoking Genetics ? Allergy Lung Cancer Attention Disorder Coughing Fatigue Direct cause Car Accident Genetics manipulated: remains predictive of LC and AD.

33 Attention disorder manipulated: no longer predictive of Genetics.
Manipulating a single variable 5 Anxiety Peer Pressure Born an Even Day Yellow Fingers Smoking Genetics Allergy Lung Cancer Attention Disorder Coughing Fatigue Direct cause Car Accident Attention disorder manipulated: no longer predictive of Genetics.

34 The CYTO problem Karen Sachs et al PLCg Erk1/2 Mek1/2 Raf PKC p38 Akt
MAPKKK PLCg Erk1/2 Mek1/2 Raf PKC p38 Akt MEK4/7 JNK L A T Lck VAV SLP-76 RAS PKA 1 2 3 CD28 CD3 PI3K LFA-1 Cytohesin Zap70 PIP3 PIP2 JAB-1 Activators 1. a-CD3 2. a-CD28 3. ICAM-2 4. PMA 5. b2cAMP Inhibitors 6. G06976 7. AKT inh 8. Psitect 9. U0126 10. LY294002 10 5 4 6 7 9 8 Causal Protein-Signaling Networks in human T cells. Learn a protein signaling network from multicolor flow cytometry data. N=11 proteins, P~800 samples per experimental condition. E=9 conditions. Critical to our mission is perturbations, both activating the pathways of interest and inhibiting some of the specific molecules measured. Karen Sachs et al 34

35 * Motivations * Learning causal structure *
* Cross-sectional studies * … from experiments * … without experiments * Equivalent MB * * Longitudinal studies * Bring your own problem(s) * … towards LOCANET: learning the causal structure without experiments to predict the consequences of future actions.

36 What if we cannot experiment?
Experiments may be infeasible, costly or unethical Using only observations we may want to predict the effect of new policies. Policies may consist in manipulating several variables.

37 Manipulating a few variables
Lung Cancer Smoking Genetics Coughing Attention Disorder Allergy Anxiety Peer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue LUCAS1: manipulated Markov boundary

38 Manipulating all variables
Anxiety Peer Pressure Born an Even Day Yellow Fingers Smoking Genetics Allergy Lung Cancer Attention Disorder Coughing Fatigue LUCAS2: manipulated Car Accident Markov boundary

39 Causality challenge #1: causation and prediction
Task: Predict the target (e.g., Lung cancer) in “unmanipulated” or “manipulated” test data. Goals: Introduce ML people to causal discovery problems. Investigate ties between causation and prediction. Findings: Participants used either causal or non-causal feature selection. Good causal discovery (feature set containing the “manipulated” MB) correlated with good predictions. However, some participants using non-causal feature selection obtained good prediction results.

40 Causality challenge #2: The LOCANET problem
Task: Find the local causal structure around a given target variable (depth 3 network) in REGED, CINA, SIDO, MARTI. Goal: Analyze more finely to which extent causal discovery methods recover the causal structure and how this affects predicting the target values.

41 TIED Equivalent Markov boundaries
* Motivations * Learning causal structure * * Cross-sectional studies * … from experiments * … without experiments * Equivalent MB * * Longitudinal studies * Bring your own problem(s) * TIED Equivalent Markov boundaries

42 Equivalent Markov boundaries
Y Many almost identical measurements of the same (hidden) variable can lead to many statistically undistinguishable Markov boundaries. Markov boundary

43 Target Information Equivalence (TIE)
Two disjoint subsets of variables V1 and V2 are Target Information Equivalent (TIE) with respect to target Y iff: V1Y V2Y V1Y | V2 V2Y | V1 Alexander Statnikov & Constantin Aliferis

44 TIE Data (TIED) Exact equivalence
Y 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 Small example of the type of relationships implemented in TIED. The following TIE relations hold in the data: TIEY(X1, X2) TIEY(X1, X3) TIEY(X1, X11) TIEY(X2, X3) TIEY(X2, X11) TIEY(X3, X11) TIEX11(X1, X2) TIEX11(X1, X3) TIEX11(X2, X3) Notice that variables X1, X2, X3, X11, and Y are not deterministically related. Alexander Statnikov & Constantin Aliferis

45 Learning causal structure from “longitudinal” studies: SIGNET PROMO
* Motivations * Learning causal structure * * Cross-sectional studies * … from experiments * … without experiments * Equivalent MB * * Longitudinal studies * Bring your own problem(s) * Learning causal structure from “longitudinal” studies: SIGNET PROMO

46 SIGNET: a plant signaling network
Plants loose water and take in carbone dioxide through microscopic pores. During drought, plant hormone abiscisic acid (ABA) inhibits pore opening (important for the genetic engineering of new drought resistant plants). Unraveling the ABA signal transduction network took years of research. A recent dynamic model synthesizes many findings (Li, Assmann, Albert, PLOS, 2006). The model is used by Jenkins and Soni to generate artificial data. The problem is to reconstruct the network from the data.

47 Abscisic Acid Signaling Network
Li, Assmann, Albert, PLOS, 2006 47

48 Example of asynchronous updates for a 4-node network:
SIGNET: sample data Boolean model; asynchronous updates 43 nodes 300 simulations time Example of asynchronous updates for a 4-node network:

49 PROMO: simulated marketing task
100 products 1000 promotions 3 years of daily data Goal: quantify the effect of promotions on sales promotions products Jean-Philippe Pellet

50 PROMO: schematically…
other 1000 100 The difficulties include: non iid samples seasonal effects promotions are binary, sales are continuous the problem is more quantifying the relationships than determining the causal skeleton

51 Pot-luck challenge: Bring your own problem
* Motivations * Learning causal structure * * Cross-sectional studies * … from experiments * … without experiments * Equivalent MB * * Longitudinal studies * Bring your own problem * Pot-luck challenge: Bring your own problem

52 From NIPS 2006 workshop… 1.        Predict the consequences of a manipulation (similar to a usual predictive modeling task, but the test data is no longer distributed in the same way as the training data; the system undergoes a manipulation to produce the test data). 2.        Determine what manipulations are needed to reach a desired system state with maximum probability (e.g., select variables and propose values to achieve a certain value of a response/target variable, with perhaps a cost per variable). 3.        Propose system queries to acquire more training data, i.e. design experiments, with perhaps an associated cost per variable and per sample and perhaps with constraints on variables, which cannot be controllable. 4.        Determine all causal relationships between variables. 5.        Determine a local causal region around a response/target variable (causal adjacency). 6.        Determine the source cause(s) for a response/target variable. 7.        Determine for all variables whether they are, with respect to a response/target variable: cause, effect, consequence of a common cause, cause of a common effect, or unrelated. 8.        Predict the existence of unmeasured variables (not part of the set of variables provided in the data), which are potential confounders (are common causes of an observed variable and the target). 9.        Predict which variables called “relevant” by feature selection algorithms are potentially causally irrelevant because their correlation to the target is the result of an experimental artifact (e.g., sampling bias or systematic error). 10.     Determine a causal order of all variables. 11.     Determine a causal direction in time series data in which one variable is causing the other. 12.     Determine the direction of time in a time series (mostly of fundamental rather than practical interest). 13.     Incorporate prior knowledge in causal discovery. 14.     Predict counterfactuals.

53 http://clopinet.com/causality September 15, 2008: challenge start.
October 15, 2008: deadline for (optional) submission of milestone challenge results. October 24, 2008: workshop abstracts due. November 12, 2008: challenge ends (last day to submit challenge results). November 21, 2008: JMLR proceedings paper submission deadline. December 12, 2008: challenge results publicly released; workshop.

54 Prizes Four prizes (free NIPS workshop entrance or $200).
Best solution to one or more problems: 3 prizes. Best problem:1 prize. All competitors must submit a 6-page paper. Criteria: performance/usefulness, novelty/originality, sanity, insight, reproducibility, clarity.


Download ppt "Causality challenge #2: Pot-Luck"

Similar presentations


Ads by Google