An Introduction to Causal Modeling and Discovery Using Graphical Models Greg Cooper University of Pittsburgh.

Slides:



Advertisements
Similar presentations
Bayesian network for gene regulatory network construction
Advertisements

CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Jan, 29, 2014.
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Causal Discovery from Medical Textual Data Subramani Mani and Gregory F. Cooper.
A Tutorial on Learning with Bayesian Networks
Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer.
Topic Outline Motivation Representing/Modeling Causal Systems
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
Outline 1)Motivation 2)Representing/Modeling Causal Systems 3)Estimation and Updating 4)Model Search 5)Linear Latent Variable Models 6)Case Study: fMRI.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
Dynamic Bayesian Networks (DBNs)
Learning Bayesian Networks. Dimensions of Learning ModelBayes netMarkov net DataCompleteIncomplete StructureKnownUnknown ObjectiveGenerativeDiscriminative.
The IMAP Hybrid Method for Learning Gaussian Bayes Nets Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Statistical Relational Learning for Link Prediction Alexandrin Popescul and Lyle H. Unger Presented by Ron Bjarnason 11 November 2003.
Bayesian Biosurveillance Using Causal Networks Greg Cooper RODS Laboratory and the Laboratory for Causal Modeling and Discovery Center for Biomedical Informatics.
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
1 gR2002 Peter Spirtes Carnegie Mellon University.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Learning Bayesian Networks (From David Heckerman’s tutorial)
Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.
Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Bayes Net Perspectives on Causation and Causal Inference
Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks Dirk Husmeier Adriano V. Werhli.
A Brief Introduction to Graphical Models
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?
Reverse engineering gene regulatory networks Dirk Husmeier Adriano Werhli Marco Grzegorczyk.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Non-Informative Dirichlet Score for learning Bayesian networks Maomi Ueno and Masaki Uto University of Electro-Communications, Japan 1.Introduction: Learning.
Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Randomized Algorithms for Bayesian Hierarchical Clustering
Course files
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Tractable Inference for Complex Stochastic Processes X. Boyen & D. Koller Presented by Shiau Hong Lim Partially based on slides by Boyen & Koller at UAI.
Lecture 2: Statistical learning primer for biologists
De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.
Wei Sun and KC Chang George Mason University March 2008 Convergence Study of Message Passing In Arbitrary Continuous Bayesian.
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
Advances in Bayesian Learning Learning and Inference in Bayesian Networks Irina Rish IBM T.J.Watson Research Center
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
Introduction on Graphic Models
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.
Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Qian Liu CSE spring University of Pennsylvania
Inference in Bayesian Networks
Irina Rish IBM T.J.Watson Research Center
Markov Properties of Directed Acyclic Graphs
CSCI 5822 Probabilistic Models of Human and Machine Learning
Bayesian Models in Machine Learning
A Short Tutorial on Causal Network Modeling and Discovery
Causal Data Mining Richard Scheines
Network Inference Chris Holmes Oxford Centre for Gene Function, &,
Parametric Methods Berlin Chen, 2005 References:
Presentation transcript:

An Introduction to Causal Modeling and Discovery Using Graphical Models Greg Cooper University of Pittsburgh

Overview Introduction Introduction Representation Representation Inference Inference Learning Learning Evaluation Evaluation

What Is Causality? Much consideration in philosophy Much consideration in philosophy I will treat it as a primitive I will treat it as a primitive Roughly, if we manipulate something and something else changes, then the former causally influences the latter. Roughly, if we manipulate something and something else changes, then the former causally influences the latter.

Why is Causation Important? Causal issues arise in most fields including medicine, business, law, economics, and the sciences Causal issues arise in most fields including medicine, business, law, economics, and the sciences An intelligent agent is continually considering what to do next in order to change the world (including the agents own mind). That is a causal question. An intelligent agent is continually considering what to do next in order to change the world (including the agents own mind). That is a causal question.

Representing Causation Using Causal Bayesian Networks A causal Bayesian network (CBN) represents some entity (e.g., a patient) that we want to model causally A causal Bayesian network (CBN) represents some entity (e.g., a patient) that we want to model causally Features of the entity are represented by variables/nodes in the CBN Features of the entity are represented by variables/nodes in the CBN Direct causation is represented by arcs Direct causation is represented by arcs

An Example of a Causal Bayesian Network Structure History of Smoking (HS) Lung Cancer (LC)Chronic Bronchitis (CB) Fatigue (F)Weight Loss (WL)

An Example of the Accompanying Causal Bayesian Network Parameters P(HS = no) = 0.80P(HS = yes) = 0.20 P(CB = absent | HS = no) = 0.95P(CB = present | HS = no) = 0.05 P(CB = absent | HS = yes) = 0.75P(CB = present | HS = yes) = 0.25 P(LC = absent | HS = no) = P(LC = present | HS = no) = P(LC = absent | HS = yes) = 0.997P(LC = present | HS = yes) = 0.003

Causal Markov Condition A node is independent of its non-effects given just its direct causes. A node is independent of its non-effects given just its direct causes. This is the key representational property of causal Bayesian networks. This is the key representational property of causal Bayesian networks. Special case: A node is independent of its distant causes given just its direct causes. Special case: A node is independent of its distant causes given just its direct causes. General notion: Causality is local General notion: Causality is local

Causal Modeling Framework An underlying process generates entities that share the same causal network structure. The entities may have different parameters (probabilities). An underlying process generates entities that share the same causal network structure. The entities may have different parameters (probabilities). Each entity independently samples the joint distribution defined by its CBN to generate values (data) for each variable in the CBN model Each entity independently samples the joint distribution defined by its CBN to generate values (data) for each variable in the CBN model

Entity Generator HS 1 LC 1 WL 1 HS 2 LC 2 WL 2 HS 3 LC 3 WL 3 (no, absent, absent) existing entities entity feature values samples (yes, present, present)(yes, absent, absent) (no, absent, absent) (yes, absent, absent)

Discovering the Average Causal Bayesian Network HS avg LC avg WL avg

Some Key Types of Causal Relationships HS LC HS WL Direct Causation Indirect Causation Confounding HS LC CB WLF Sampled = true Sampling bias

Inference Using a Single CBN When Given Evidence in the Form of Observations History of Smoking (HS) Lung Cancer (LC)Chronic Bronchitis (CB) Fatigue (F)Weight Loss (WL) P(F | CB = present, WL = present, CBN 1 )

Inference The Markov Condition implies the following equation: The Markov Condition implies the following equation: The above equation specifies the full joint probability distribution over the model variables. The above equation specifies the full joint probability distribution over the model variables. From the joint distribution we can derive any conditional probability of interest. From the joint distribution we can derive any conditional probability of interest.

Inference Algorithms In the worst case, the brute force algorithm is exponential time in the number of variables in the model In the worst case, the brute force algorithm is exponential time in the number of variables in the model Numerous exact inference algorithms have been developed that exploit independences among the variables in the causal Bayesian network. Numerous exact inference algorithms have been developed that exploit independences among the variables in the causal Bayesian network. However, in the worst case, these algorithms are exponential time. However, in the worst case, these algorithms are exponential time. Inference in causal Bayesian networks is NP-hard (Cooper, AIJ, 1990). Inference in causal Bayesian networks is NP-hard (Cooper, AIJ, 1990).

Inference Using a Single CBN When Given Evidence in the Form of Manipulations P(F | M CB = present, CBN 1 ) Let M CB be a new variable that can have the same values as CB (present, absent) plus the value observe. Add an arc from M CB to CB. Define the probability distribution of CB given its parents.

Inference Using a Single CBN When Given Evidence in the Form of Manipulations History of Smoking (HS) Lung Cancer (LC)Chronic Bronchitis (CB) Fatigue (F)Weight Loss (WL) P(F | M CB = present, CBN 1 ) M CB

A Deterministic Manipulation History of Smoking (HS) Lung Cancer (LC)Chronic Bronchitis (CB) Fatigue (F)Weight Loss (WL) P(F | M CB = present), CBN 1 ) M CB

Inference Using a Single CBN When Given Evidence in the Form of Observations and Manipulations History of Smoking (HS) Lung Cancer (LC)Chronic Bronchitis (CB) Fatigue (F)Weight Loss (WL) P(F | M CB = present, WL = present, CBN 1 ) M CB

Inference Using Multiple CBNs: Model Averaging

Some Key Reasons for Learning CBNs Scientific discovery among measured variables Scientific discovery among measured variables Example of general: What are the causal relationships among HS, LC, CB, F, and WL? Example of general: What are the causal relationships among HS, LC, CB, F, and WL? Example of focused: What are the causes of LC from among HS, CB, F, and WL? Example of focused: What are the causes of LC from among HS, CB, F, and WL? Scientific discovery of hidden processes Scientific discovery of hidden processes Prediction Prediction Example: The effect of not smoking on contracting lung cancer Example: The effect of not smoking on contracting lung cancer

Major Methods for Learning CBNs from Data Constraint-based methods Constraint-based methods Uses tests of independence to find patterns of relationships among variables that support causal relationships Uses tests of independence to find patterns of relationships among variables that support causal relationships Relatively efficient in discovery of causal models with hidden variables Relatively efficient in discovery of causal models with hidden variables See talk by Frederick Eberhardt this morning See talk by Frederick Eberhardt this morning Score-based methods Bayesian scoring Score-based methods Bayesian scoring Allows informative prior probabilities of causal structure and parameters Allows informative prior probabilities of causal structure and parameters Non-Bayesian scoring Non-Bayesian scoring Does not allow informative prior probabilities Does not allow informative prior probabilities

Learning CBNs from Observational Data: A Bayesian Formulation where D is observational data, S i is the structure of CBN i, and K is background knowledge and belief.

Learning CBNs from Observational Data When There Are No Hidden Variables where i are the parameters associated with S i and the sum is over all CBNs for which P(S j | K) > 0.

The BD Marginal Likelihood The previous integral has the following closed form solution, when we assume Dirichlet priors ( ijk and ij ), multinomial likelihoods (N ijk and N ij denote counts), parameter independence, and parameter modularity: The previous integral has the following closed form solution, when we assume Dirichlet priors ( ijk and ij ), multinomial likelihoods (N ijk and N ij denote counts), parameter independence, and parameter modularity:

Searching for Network Structures Greedy search often used Greedy search often used Hybrid methods have been explored that constraints and scoring Hybrid methods have been explored that constraints and scoring Some algorithms guarantee locating the generating model in the large sample limit (assuming Markov and Faithfulness conditions), as for example the GES algorithm (Chickering, JMLR, 2002) Some algorithms guarantee locating the generating model in the large sample limit (assuming Markov and Faithfulness conditions), as for example the GES algorithm (Chickering, JMLR, 2002) The ability to approximate the generating network is often quite good The ability to approximate the generating network is often quite good An excellent discussion and evaluation of several state- of-the-art methods, including a relatively new method (Max-Min Hill Climbing) is at: An excellent discussion and evaluation of several state- of-the-art methods, including a relatively new method (Max-Min Hill Climbing) is at: Tsamardinos, Brown, Aliferis, Machine Learning, 2006.

The Complexity of Search Given a complete dataset and no hidden variables, locating the Bayesian network structure that has the highest posterior probability is NP-hard (Chickering, AIS, 1996; Chickering, et al, JMLR, 2004). Given a complete dataset and no hidden variables, locating the Bayesian network structure that has the highest posterior probability is NP-hard (Chickering, AIS, 1996; Chickering, et al, JMLR, 2004).

We Can Learn More from Observational and Experimental Data Together than from Either One Alone EC H We cannot learn the above causal structure from observational or experimental data alone. We need both.

Learning CBNs from Observational Data When There Are Hidden Variables where H i (H j ) are the hidden variables in S i (S j ) and the sum in the numerator (denominator) is taken over all values of H i (H j ).

Learning CBNs from Observational and Experimental Data: A Bayesian Formulation For each model variable X i that is experimentally manipulated in at least one case, introduce a potential parent M X i of X i. X i can have parents as well from among the other {X 1,..., X i-1, X i+1,..., X n } domain variables in the model. Priors on the distribution of X i will include conditioning on M X i,when it is a parent of X i, as well as conditioning on the other parents of X i. Define M X i to have the same values v i1, v i2,..., v iq as X i, plus a value o (for observe). o When M X i has value v ij in a given case, this represents that the experimenter intended to manipulate X i to have value v ij in the case. o When M X i has value observe in a given case, this represents that no attempt was made by the experimenter to manipulate X i, but rather, X i was merely observed to have the value recorded for it. With the above variable additions in place, use the previous Bayesian methods for causal modeling from observational data.

An Example Database Containing Observations and Manipulations HSM CB CBLCFWL TobsTFTT FFFTTF FFTTFF T FFTF

Faithfulness Condition Faithfulness Condition Faithfulness Condition Any independence among variables in the data generating distribution follows from the Markov Condition applied to the data generating causal structure. A simple counter example: A simple counter example: EC H

Challenges of Bayesian Learning of Causal Networks Major challenges Major challenges Large search spaces Large search spaces Hidden variables Hidden variables Feedback Feedback Assessing parameter and structure priors Assessing parameter and structure priors Modeling complicated distributions Modeling complicated distributions The remainder of this talk will summarize several methods for dealing with hidden variables, which is arguably the biggest major challenge today The remainder of this talk will summarize several methods for dealing with hidden variables, which is arguably the biggest major challenge today These examples provide only a small sample of previous research These examples provide only a small sample of previous research

Learning Belief Networks in the Presence of Missing Values and Hidden Variables (N. Friedman, ICML, 1997) Assumes a fixed set of measured and hidden variables Assumes a fixed set of measured and hidden variables Uses Expectation Maximization (EM) to fill in the values of the hidden variable Uses Expectation Maximization (EM) to fill in the values of the hidden variable Uses BIC to score causal network structures with the filled-in data. Greedily finds best structure and then returns to the EM step using this new structure. Uses BIC to score causal network structures with the filled-in data. Greedily finds best structure and then returns to the EM step using this new structure. Some subsequent work Some subsequent work Use patterns of induced relationships among the measured variables to suggest where to introduce hidden variables (Elidan, et al., NIPS, 2000) Use patterns of induced relationships among the measured variables to suggest where to introduce hidden variables (Elidan, et al., NIPS, 2000) Determining the cardinality of the hidden variables introduced (Elidan & Friedman, UAI, 2001) Determining the cardinality of the hidden variables introduced (Elidan & Friedman, UAI, 2001)

A Non-Parametric Bayesian Methods for Inferring Hidden Causes (Wood, et al., UAI, 2006) Learns hidden causes of measured variables Learns hidden causes of measured variables Assumes binary variables and noisy-OR interactions Assumes binary variables and noisy-OR interactions Uses MCMC to sample the hidden structures Uses MCMC to sample the hidden structures Allows in principle an infinite number of hidden variables Allows in principle an infinite number of hidden variables In practice, the number of optimal hidden variables is constrained by the measured data In practice, the number of optimal hidden variables is constrained by the measured data hidden variables measured variables

Bayesian Learning of Measurement and Structural Model (Silva & Scheines, ICML, 2006) Learns the following type of models Learns the following type of models Assumes continuous variables, mixture of Gaussian distributions, and linear interactions Assumes continuous variables, mixture of Gaussian distributions, and linear interactions hidden variables measured variables

Mixed Ancestral Graphs * A MAG(G) is a graphical object that contains only the observed variables, causal arcs, and a new relationship for representing hidden confounding. A MAG(G) is a graphical object that contains only the observed variables, causal arcs, and a new relationship for representing hidden confounding. There exist methods for scoring linear MAGS (Richardson & Spirtes Ancestral Graph Markov Models, Annals of Statistics, 2002) There exist methods for scoring linear MAGS (Richardson & Spirtes Ancestral Graph Markov Models, Annals of Statistics, 2002) SES SEX PE CP SEX PE CP IQ IQ SES SEX PE CP SEX PE CP IQ IQ L1L1L1L1 L2L2L2L2 Latent Variable DAG Corresponding MAG * This slide was adapted from a slide provided by Peter Spirtes.

(Mani, Spirtes, Cooper, UAI, 2006) A Theoretical Study of Y Structures for Causal Discovery (Mani, Spirtes, Cooper, UAI, 2006) Learn a Bayesian network structure on the measured variables Learn a Bayesian network structure on the measured variables Identify patterns in the structure that suggest causal relationships Identify patterns in the structure that suggest causal relationships The Y structure shown in green supports that D is an unconfounded cause of F. The Y structure shown in green supports that D is an unconfounded cause of F. A B C D E F

Causal Discovery Using Subsets of Variables Search for an estimate M of the Markov blanket of a variable X (e.g., Aliferis, et al., AMIA, 2002) Search for an estimate M of the Markov blanket of a variable X (e.g., Aliferis, et al., AMIA, 2002) X is independent of other variables in the generating causal network model, conditioned on the variables in Xs Markov blanket X is independent of other variables in the generating causal network model, conditioned on the variables in Xs Markov blanket Within M search for patterns among the variables that suggest a causal relationship to X (e.g., Mani, doctoral dissertation, Un. of Pittsburgh, 2006) Within M search for patterns among the variables that suggest a causal relationship to X (e.g., Mani, doctoral dissertation, Un. of Pittsburgh, 2006)

Causal Identifiability Generally depends upon Generally depends upon Markov Condition Markov Condition Faithfulness Condition Faithfulness Condition Informative structural relationships among the measured variables Informative structural relationships among the measured variables Example of the Y structure: Example of the Y structure: C E AB

Evaluation of Causal Discovery In evaluating a classifier, the correct answer in any instance is just the value of some variable of interest, which typically is explicitly in the data set. This make evaluation relatively straightforward. In evaluating a classifier, the correct answer in any instance is just the value of some variable of interest, which typically is explicitly in the data set. This make evaluation relatively straightforward. In evaluating the output of a causal discovery algorithm, the answer is not in the dataset. In general we need some outside knowledge to confirm that the causal output is correct. This makes evaluation relatively difficult. Thus, causal discovery algorithms have not been thoroughly evaluated. In evaluating the output of a causal discovery algorithm, the answer is not in the dataset. In general we need some outside knowledge to confirm that the causal output is correct. This makes evaluation relatively difficult. Thus, causal discovery algorithms have not been thoroughly evaluated.

Methods for Evaluating Causal Discovery Algorithms Simulated data Simulated data Real data with expert judgments of causation Real data with expert judgments of causation Real data with previously validated causal relationships Real data with previously validated causal relationships Real data with follow up experiments Real data with follow up experiments

An Example of an Evaluation Using Simulated Data (Mani, poster here) Generated 20,000 observational data samples from each of five CBNs that were manually constructed Generated 20,000 observational data samples from each of five CBNs that were manually constructed Applied the BLCD algorithm, which considers many 4- variable subsets of all the variables and applies Bayesian scoring. It is based on the causal properties of Y structures. Applied the BLCD algorithm, which considers many 4- variable subsets of all the variables and applies Bayesian scoring. It is based on the causal properties of Y structures. Results Results Precision: 83% Precision: 83% Recall: 27% Recall: 27%

An Example of an Evaluation Using Previously Validated Causal Relationships (Yoo, et al., PSB, 2002) ILVS is a Bayesian method that considers pairwise relationships among a set of variables ILVS is a Bayesian method that considers pairwise relationships among a set of variables It works best when given both observational and experimental data It works best when given both observational and experimental data ILVS was applied to a previously collected DNA microarray dataset on 9 genes that control galactose metabolism in yeast (Ideker, et al., Science, 2001) The causal relationships among the genes have been extensively studied and reported in the literature. ILVS was applied to a previously collected DNA microarray dataset on 9 genes that control galactose metabolism in yeast (Ideker, et al., Science, 2001) The causal relationships among the genes have been extensively studied and reported in the literature. ILVS predicted 12 of 27 known causal relationships among the genes (44% recall) and of those 12 eight were correct (67% precision) ILVS predicted 12 of 27 known causal relationships among the genes (44% recall) and of those 12 eight were correct (67% precision) Yoo has explored numerous extensions to ILVS Yoo has explored numerous extensions to ILVS

An Example of an Evaluation Using Real Data with Follow Up Experiments (Sachs, et al., Science, 2005) Experimentally manipulated human immune system cells Experimentally manipulated human immune system cells Used flow cytometry to measure the effects on 11 proteins and phospholipids on a large number of individual cells Used flow cytometry to measure the effects on 11 proteins and phospholipids on a large number of individual cells Used a Bayesian method for causally learning from observational and experimental data Used a Bayesian method for causally learning from observational and experimental data Derived 17 causal relationships with high probability Derived 17 causal relationships with high probability 15 highly supported by the literature (precision = 15/17 = 88%) 15 highly supported by the literature (precision = 15/17 = 88%) The other two were confirmed experimentally by the authors (precision = 17/17 = 100%) The other two were confirmed experimentally by the authors (precision = 17/17 = 100%) Three causal relationships were missed (recall = 17 /20 = 85%) Three causal relationships were missed (recall = 17 /20 = 85%)

A Possible Approach to Combining Causal Discovery and Feature Selection 1. Use prior knowledge and statistical associations to develop overlapping groups of features (variables) 2. Derive causal probabilistic relationships within groups 3. Have the causal groups constrain each other 4. Determine additional groups of features that might constrain causal relationships further 5. Either go to step 2 or step 6 6. Model average within and across groups to derive approximate model-averaged causal relationships David Danks Learning the Causal Structure of Overlapping Variable Sets. In S. Lange, K. Satoh, & C.H. Smith, eds. Discovery Science: Proceedings of the 5th International Conference. Berlin: Springer-Verlag. pp

Some Suggestions for Further Information Books Books Glymour, Cooper (eds), Computation, Causation, and Discovery (MIT Press, 1999) Glymour, Cooper (eds), Computation, Causation, and Discovery (MIT Press, 1999) Pearl, Causality: Models, Reasoning, and Inference (Cambridge University Press, 2000) Pearl, Causality: Models, Reasoning, and Inference (Cambridge University Press, 2000) Spirtes, Glymour, Scheines, Causation, Prediction, and Search (MIT Press, 2001) Spirtes, Glymour, Scheines, Causation, Prediction, and Search (MIT Press, 2001) Neapolitan, Learning Bayesian Networks (Prentice Hall, 2003) Neapolitan, Learning Bayesian Networks (Prentice Hall, 2003) Conferences Conferences UAI, ICML, NIPS, AAAI, IJCAI UAI, ICML, NIPS, AAAI, IJCAI Journals Journals JMLR, Machine Learning JMLR, Machine Learning

Acknowledgement Thanks to Peter Spirtes for his comments on an outline of this talk