276 Causal Discovery Methods Using Causal Probabilistic Networks MEDINFO 2004, T02: Machine Learning Methods for Decision Support and Discovery Constantin.

Slides:

Advertisements

Similar presentations

Bayesian network for gene regulatory network construction

Advertisements

A Tutorial on Learning with Bayesian Networks

Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer.

Topic Outline Motivation Representing/Modeling Causal Systems

1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.

Weakening the Causal Faithfulness Assumption

Outline 1)Motivation 2)Representing/Modeling Causal Systems 3)Estimation and Updating 4)Model Search 5)Linear Latent Variable Models 6)Case Study: fMRI.

Lecture 5: Causality and Feature Selection Isabelle Guyon

Introduction of Probabilistic Reasoning and Bayesian Networks

Using Markov Blankets for Causal Structure Learning Jean-Philippe Pellet Andre Ellisseeff Presented by Na Dai.

From: Probabilistic Methods for Bioinformatics - With an Introduction to Bayesian Networks By: Rich Neapolitan.

Causality Workbenchclopinet.com/causality Results of the Causality Challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt.

Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.

Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11

Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.

Bayesian Network Representation Continued

Simulation and Application on learning gene causal relationships Xin Zhang.

1 gR2002 Peter Spirtes Carnegie Mellon University.

Lecture 6: Causal Discovery Isabelle Guyon

Bayesian Networks Alan Ritter.

Artificial Intelligence Term Project #3 Kyu-Baek Hwang Biointelligence Lab School of Computer Science and Engineering Seoul National University

Computer vision: models, learning and inference Chapter 10 Graphical Models.

Causal Models, Learning Algorithms and their Application to Performance Modeling Jan Lemeire Parallel Systems lab November 15 th 2006.

Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.

Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.

1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.

Bayes Net Perspectives on Causation and Causal Inference

A Brief Introduction to Graphical Models

Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.

Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.

Feature Selection and Causal discovery Isabelle Guyon, Clopinet André Elisseeff, IBM Zürich Constantin Aliferis, Vanderbilt University.

315 Feature Selection. 316 Goals –What is Feature Selection for classification? –Why feature selection is important? –What is the filter and what is the.

Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:

Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for

Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.

Introduction to Bayesian Networks

Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.

Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.

1 CMSC 671 Fall 2001 Class #25-26 – Tuesday, November 27 / Thursday, November 29.

Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.

Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.

Learning With Bayesian Networks Markus Kalisch ETH Zürich.

273 Discovery of Causal Structure Using Causal Probabilistic Network Induction AMIA 2003, Machine Learning Tutorial Constantin F. Aliferis & Ioannis Tsamardinos.

1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.

METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture notes 9 Bayesian Belief Networks.

Computing & Information Sciences Kansas State University Data Sciences Summer Institute Multimodal Information Access and Synthesis Learning and Reasoning.

INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.

Slides for “Data Mining” by I. H. Witten and E. Frank.

The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)

Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.

Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André.

Data Mining and Decision Support

The Visual Causality Analyst: An Interactive Interface for Causal Reasoning Jun Wang, Stony Brook University Klaus Mueller, Stony Brook University, SUNY.

04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.

Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.

Variable selection in Regression modelling Simon Thornley.

1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.

1 Day 2: Search June 14, 2016 Carnegie Mellon University Center for Causal Discovery.

Irina Rish IBM T.J.Watson Research Center

Markov Properties of Directed Acyclic Graphs

Learning Markov Blankets

Feature Selection Ioannis Tsamardinos Machine Learning Course, 2006

A Short Tutorial on Causal Network Modeling and Discovery

Center for Causal Discovery: Summer Short Course/Datathon

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

An Algorithm for Bayesian Network Construction from Data

Searching for Graphical Causal Models of Education Data

Presentation transcript:

276 Causal Discovery Methods Using Causal Probabilistic Networks MEDINFO 2004, T02: Machine Learning Methods for Decision Support and Discovery Constantin F. Aliferis & Ioannis Tsamardinos Discovery Systems Laboratory Department of Biomedical Informatics Vanderbilt University

277 Desire for Causal Knowledge Diagnosis Knowing that “people with cancer often have yellow- stained fingers and feel fatigue”, diagnose lung cancer Prevention Need to know that “Smoking causes lung cancer” to reduce the risk of cancer Treatment Knowing that “the presence of protein X causes cancer, inactivate protein X, using medicine Y that causes X to be inactive” Causal Knowledge NOT required Causal Knowledge required

278 Importance of Causal Discovery Today What SNP combination causes what disease How genes and proteins are organized in complex causal regulatory networks How behaviour causes disease How genotype causes differences in response to treatment How the environment modifies or even supersedes the normal causal function of genes

279 What is Causality? Thousands of years old problem, still debated Operational Informal Definition: Assume the existence of a mechanism M capable of setting values for a variable A. We say that A can be manipulated by M to take the desired values. Variable A causes variable B, if: in a hypothetical randomized controlled experiment in which A is randomly manipulated via M (i.e., all possible values a i of A are randomly assigned to A via M) we would observe in the sample limit that P(B= b | A= a i ) ≠ P(B= b | A=a j ) for some i≠j. Definition is stochastic Problems: self-referencing, ignores time-dependence, variables that need to be co-manipulated, etc.

280 Causation and Association What is the relationship between the two? If A causes B, are A and B always associated? If A is associated with B are they always causes or effects of each other? (directly?, indirectly?, conditionally, unconditionally?)

281 Statistical Indistinguishability SMOKINGLUNG CA SMOKINGLUNG CA SMOKINGLUNG CA GENE S1 S2 S3

282 RANDOMIZED CONTROLLED TRIALS SMOKINGLUNG CA SMOKINGLUNG CA SMOKINGLUNG CA GENE S1 S2 S3 Association is still retained even after manipulating Smoking

283 RCTs Are not always feasible! Unethical (smoking) Costly/Time consuming (gene manipulation, epidemiology) Impossible (astronomy) Extremely large number

284 Large-Scale Causal Discovery without RCTs? Heuristics to the rescue… What is a heuristic? When the heuristic condition holds, most probably a causal association holds

285 Pitfalls of Causal Heuristics ‘If A is a robust and strong predictor of T then A is likely a cause of T’ - Example: Feature selection - Example: Predictive Rules - In the example: Tuberculosis is a strong predictor of Lung Cancer (when Haemoptysis is included as a predictor), but not causally associated with Lung Cancer TuberculosisLung Ca Haemoptysis

286 Pitfalls of Causal Heuristics ‘The closer A and T are in a causal sense, the stronger their correlation’ (localizes causality as well) Poor Fitness may be a stronger predictor (univariately) to lung cancer than either Occupation or Smoking individually; yet the latter predictors are causally “closer” to Lung Cancer Lung Cancer Smoking Occupation Poor Fitness Fatigue Anemia Stress (Potentially) Smallest predictor set with optimal accuracy

287 The Problem with Causal Discovery Causal heuristics are unreliable Causation is difficult to define RCTs are not always doable Major “causal knowledge” does not have RCT backing!

288 Formal Computational Causal Discovery from Observational Data Formal algorithms exist! Most are based on a graphical-probabilistic language called “Causal Probabilistic Networks (a.k.a. “Causal Bayesian Networks”) Well-characterized properties of What types of causal relations they can learn Under which conditions What kind of errors they may make

289 Types of Causal Discovery Questions What will be the effect of a manipulation to the system Is A causing B, B causing A, or neither? Is A causing B directly (no other observed variables interfere)? What is the smallest set of variables for optimally effective manipulation of A? Can we infer the presence of hidden confounder factors/variables?

290 A Formal Language for Representing Causality Bayesian Networks Edges: probabilistic dependence Markov Condition: A node N is independent from non-descendants given its parents Probabilistic reasoning Causal Bayesian Networks Edges represent direct causal effects Causal Markov Condition: A node N is independent from non- descendants given its direct causes Probabilistic reasoning + causal inferences

291 Causal Bayesian Networks There may be many (non-causal) BNs that capture the same distribution. All such BNs have the same edges (ignoring direction) same v- structures Statistically equivalent A CD G BA CD G B A CD G B

292 Causal Bayesian Networks If there is a (faithful) Causal Bayesian Network that captures the data generation process, it has to have the same edges and same v-structures as any (faithful) Bayesian Network that is induced by the data. We can infer what the direct causal relations are We can infer some of the directions of the edges Gene2Gene1 Gene3 Gene1Gene2 Gene1Gene2

293 Faithfulness When d-separation  independence Intuitively, an open path between A and B means there is association between them in the data Previous discussion holds for faithful BNs only Faithful BN is a very large class of BNs

294 An edge X – Y (of unknown direction) exists, if and only if for all sets of nodes S, Dep(X, Y | S) (allows discovery of the edges) Test all subsets. If Dep(X,Y|s) holds, add the edge, otherwise do not. If structure and for every set S that contains F, Dep(X, Y | S), then Learning Bayesian Networks: Constraint-Based Approach C F B C F B

295 Learning Bayesian Networks: Constraint-Based Approach Tests of conditional dependences and independencies from the data Estimation using G 2 statistic, conditional mutual- information, etc. Infer structure and orientation from results of tests Based on the assumption these tests are accurate The larger the number of nodes in the conditioning set, the more samples are required to estimate the dependence, Ind(A,B|C,D,E) more sample than Ind(A,B|C,D) For relatively sparse networks, we can d-separate two nodes conditioned on a couple of variables (sample requirements in the low hundreds)

296 Learning Bayesian Networks: Search- and-Score Score each possible structure Bayesian score: P(Structure | Data) Search in the space of all possible BNs structures to find the one that maximizes score. Search space too large. Greedy or local search is typical. Greedy search: add, delete, or reverse the edge that increases the score the most.

297 The PC algorithm (Spirtes, Glymour, Scheines 1993) Phase I: Edge detection Start with a fully connected undirected network For each subset of variables of size n=0, 1, … For each remaining edge A – B  If there is a subset S of variables still connected to A or B of size n such Ind(A; B| S), remove edge A – B Phase II: Edge orientation For every possible V-structure A – B – C with A – C missing If Dep(A,C|B), orient A  B  C While no more orientations possible If A  B – C and A – C missing, orient it as A  B  C If there is a path A  ….  B orient the edge A – B as A  B

298 Trace Example of the PC AB D C E True Graph Start with a fully connected undirected network AB D C E Current candidate graph

299 Trace Example of the PC AB D C E True Graph AB D C E Current candidate graph For subsets of size 0 For each remaining edge A – B If there is a subset S of variables still connected to A or B of size n such Ind(A; B| S), remove edge A – B No independencies discovered

300 Trace Example of the PC AB D C E True Graph AB D C E Current candidate graph For subsets of size 1 For each remaining edge A – B If there is a subset S of variables still connected to A or B of size n such Ind(A; B| S), remove edge A – B Ind(A,C|B) Ind(A,E|B) Ind(A,D|B) Ind(C,D|B)

301 Trace Example of the PC AB D C E True Graph AB D C E Current candidate graph For subsets of size 2 For each remaining edge A – B If there is a subset S of variables still connected to A or B of size n such Ind(A; B| S), remove edge A – B Ind(B,E|C,D)

302 Trace Example of the PC AB D C E True Graph AB D C E Current candidate graph Phase II: Edge orientation For every possible V-structure: A – B – C with A – C missing If Dep(A,C|B), orient A  B  C Condition does not hold

303 Trace Example of the PC AB D C E True Graph AB D C E Current candidate graph Phase II: Edge orientation For every possible V-structure: A – B – D with A – D missing If Dep(A,D|B), orient A  B  D Condition does not hold

304 Trace Example of the PC AB D C E True Graph AB D C E Current candidate graph Phase II: Edge orientation For every possible V-structure C – B – D with C – D missing If Dep(C,D|B), orient C  B  D Condition does not hold

305 Trace Example of the PC AB D C E True Graph AB D C E Current candidate graph Phase II: Edge orientation For every possible V-structure C – E – D with C – D missing If Dep(C,D|E), orient C  E  D Condition holds

306 Trace Example of the PC AB D C E True Graph AB D C E Current candidate graph Final output!

307 Min-Max Hill Climbing algorithm Brown,Tsamardinos, Aliferis, MedInfo 2004 Based on the same ideas as PC and uses tests of conditional independence Uses different search strategy to identify interesting independence relations Similar quality results as PC but scales up to tens of thousands of variables (PC can only handle a couple of hundred variables)

308 Local Causal Discovery Max-Min Parents and Children: returns the parents and children of a target variable Tsamardinos, Aliferis, Statnikov KDD 2003 Scales-up to tens of thousands of variables A CD FG B H J I E KL

309 Local Causal Discovery Max-Min Markov Blanket: returns the parents and children of a target variable Scales-up to tens of thousands of variables HITON (Aliferis, Tsamardinos, Statnikov AMIA 2003) close variant: different heuristic+wrapping with a classifier to optimize for variable selection tasks A CD FG B H J I E KL

310 Local Causal Discovery- Different Flavor Mani&Cooper 2000, 2001, Silverstein, Brin, Motwani, Ullman Rule 1: A, B, C pairwise dependent, Ind(A,C|B), A has no causes within the observed variables (e.g. temperature in a gene expression experiment), then Α  …  B  …  C Rule 2: Dep(A,B|  ), Dep(A,C|  ), Ind(B,C|  ), Dep(B,C|A), then B  …  A  …  C Discovers a coarser causal model (ancestor relations and indirect causality)

311 FCI – Causal Discovery with Hidden Confounders Ind(SE,LC|  ) Dep(SE,LC|SM) Ind(SM,OC|  ) Dep(SM,OC|LC) SMOKINGLUNG CA GENE SOCIAL ENV.OCCUPATION The only consistent model with all tests is one that has a hidden confounder

312 Other Causal Discovery Algorithms Large body of work in Bayesian (or other) search and score methods; still similar set of assumptions (Neapolitan 2003) Learning with linear Structural Equation Models in systems in static equilibria (allows feedback loops) (Richardson, Spirtes 1999) Learning in the presence of selection bias (Cooper 1995) Learning from mixtures of experimental and observational data (Cooper, Yoo, 1999)

313 Conclusions It is possible to perform causal discovery from observational data without Randomized Controlled Trials! Heuristic methods are typically used instead of formal causal discovery methods; their properties and their relative efficacy are unknown Causal discovery algorithms also make assumptions but have well-characterized properties There is a plethora of different algorithms with different properties and assumptions for causal discovery There is still plenty of work to be done

314 Suggested Further Reading Neapolitan, Learning Bayesian Networks, Prentice Hall, 2003 Ed. Glymour, Cooper, Computation, Causation, and Discovery, AAAI Press/MIT Press, 1999 Ed. Jordan, Learning in Graphical Models, MIT Press, 1998 Spirtes, Glymour, Scheines, Causation, Prediction, Search, MIT Press 2000