Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson.

Slides:



Advertisements
Similar presentations
1 WHY MAKING BAYESIAN NETWORKS BAYESIAN MAKES SENSE. Dawn E. Holmes Department of Statistics and Applied Probability University of California, Santa Barbara.
Advertisements

The influence of domain priors on intervention strategy Neil Bramley.
Topic Outline Motivation Representing/Modeling Causal Systems
Outline 1)Motivation 2)Representing/Modeling Causal Systems 3)Estimation and Updating 4)Model Search 5)Linear Latent Variable Models 6)Case Study: fMRI.
1 Knowledge Engineering for Bayesian Networks Ann Nicholson School of Computer Science and Software Engineering Monash University.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
Learning Bayesian Networks. Dimensions of Learning ModelBayes netMarkov net DataCompleteIncomplete StructureKnownUnknown ObjectiveGenerativeDiscriminative.
The IMAP Hybrid Method for Learning Gaussian Bayes Nets Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada
Undirected Probabilistic Graphical Models (Markov Nets) (Slides from Sam Roweis)
Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.
Learning Causality Some slides are from Judea Pearl’s class lecture
Nir Friedman, Iftach Nachman, and Dana Peer Announcer: Kyu-Baek Hwang
Data Mining Cardiovascular Bayesian Networks Charles Twardy †, Ann Nicholson †, Kevin Korb †, John McNeil ‡ (Danny Liew ‡, Sophie Rogers ‡, Lucas Hope.
1 Knowledge Engineering for Bayesian Networks Ann Nicholson School of Computer Science and Software Engineering Monash University.
1 Bayesian Networks and Causal Modelling Ann Nicholson School of Computer Science and Software Engineering Monash University.
1 Knowledge Engineering for Bayesian Networks Ann Nicholson School of Computer Science and Software Engineering Monash University.
Data Mining Cardiovascular Bayesian Networks Charles Twardy †, Ann Nicholson †, Kevin Korb †, John McNeil ‡ (Danny Liew ‡, Sophie Rogers ‡, Lucas Hope.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Parameterising Bayesian Networks: A Case Study in Ecological Risk Assessment Carmel A. Pollino Water Studies Centre Monash University Owen Woodberry, Ann.
1 Exploiting Parameter Domain Knowledge for Learning in Bayesian Networks Thesis Committee: Tom Mitchell (Chair) John Lafferty Andrew Moore Bharat Rao.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
Knowledge Engineering a Bayesian Network for an Ecological Risk Assessment (KEBN-ERA) Owen Woodberry Supervisors: Ann Nicholson Kevin Korb Carmel Pollino.
Computing Trust in Social Networks
Learning Bayesian Networks. Dimensions of Learning ModelBayes netMarkov net DataCompleteIncomplete StructureKnownUnknown ObjectiveGenerativeDiscriminative.
Simulation and Application on learning gene causal relationships Xin Zhang.
1 Knowledge Engineering for Bayesian Networks Ann Nicholson School of Computer Science and Software Engineering Monash University.
Required Sample size for Bayesian network Structure learning
1 gR2002 Peter Spirtes Carnegie Mellon University.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Learning Bayesian Networks (From David Heckerman’s tutorial)
Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.
CAUSAL SEARCH IN THE REAL WORLD. A menu of topics  Some real-world challenges:  Convergence & error bounds  Sample selection bias  Simpson’s paradox.
Bayesian networks Chapter 14. Outline Syntax Semantics.
Rule Generation [Chapter ]
Relaxation and Hybrid constraint processing Different relaxation techniques Some popular hybrid techniques.
1 Institute of Engineering Mechanics Leopold-Franzens University Innsbruck, Austria, EU H.J. Pradlwarter and G.I. Schuëller Confidence.
Using Bayesian Networks to Analyze Expression Data By Friedman Nir, Linial Michal, Nachman Iftach, Pe'er Dana (2000) Presented by Nikolaos Aravanis Lysimachos.
Tracking with Unreliable Node Sequences Ziguo Zhong, Ting Zhu, Dan Wang and Tian He Computer Science and Engineering, University of Minnesota Infocom 2009.
Non-Informative Dirichlet Score for learning Bayesian networks Maomi Ueno and Masaki Uto University of Electro-Communications, Japan 1.Introduction: Learning.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.
Estimating Component Availability by Dempster-Shafer Belief Networks Estimating Component Availability by Dempster-Shafer Belief Networks Lan Guo Lane.
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Learning Bayesian networks from postgenomic data with an improved structure MCMC sampling scheme Dirk Husmeier Marco Grzegorczyk 1) Biomathematics & Statistics.
Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.
MCMC in structure space MCMC in order space.
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
The Visual Causality Analyst: An Interactive Interface for Causal Reasoning Jun Wang, Stony Brook University Klaus Mueller, Stony Brook University, SUNY.
NTU & MSRA Ming-Feng Tsai
1 Acceleration of Inductive Inference of Causal Diagrams Olexandr S. Balabanov Institute of Software Systems of NAS of Ukraine
Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.
1 Day 2: Search June 14, 2016 Carnegie Mellon University Center for Causal Discovery.
Learning Tree Structures
Consistency Methods for Temporal Reasoning
Learning Bayesian Network Models from Data
Markov Properties of Directed Acyclic Graphs
Center for Causal Discovery: Summer Short Course/Datathon
Efficient Learning using Constrained Sufficient Statistics
CHAPTER 7 BAYESIAN NETWORK INDEPENDENCE BAYESIAN NETWORK INFERENCE MACHINE LEARNING ISSUES.
Cross-validation for the selection of statistical models
Extra Slides.
Learning Probabilistic Graphical Models Overview Learning Problems.
Searching for Graphical Causal Models of Education Data
Presentation transcript:

Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson

Outline Methods for learning causal models –Data mining, Elicitation, Hybrid approach Algorithms for learning causal models –Constraint based –Metric based (including our CaMML) Incorporating priors into CaMML – 5 different types of priors Experimental Design Experimental Results

Learning Causal Bayesian Networks ElicitationData mining Requires domain knowledgeRequires large dataset Expensive and time- consuming Sometimes, the algorithms are “stupid” (no prior knowledge →no common sense) Partial knowledge may be insufficient Data only tells part of the story

A hybrid approach Combine the domain knowledge and the facts learned from data Minimize the expert’s effort in domain knowledge elicitation Elicitation Data Mining Causal BN Enhance the efficiency of the learning process –Reduce / bias the search space

Objectives Generate different prior specification methods Comparatively study the influences of priors on the BN structural learning Future: apply the methods to the Heart Disease modeling project

Causal learning algorithms Constraint based –Pearl & Verma’s algorithm, PC Metric based –MML, MDL, BIC, BDe, K2, K2+MWST,GES,CaMML Priors on structure –Optional vs. Required –Hard vs. Soft

Priors on structure RequiredOptionalHardSoft K2 (BNT) yes K2+MWST (BNT) yes GES (Tetrad) yes PC (Tetrad ) yes CaMML yes

CaMML MML metric based MML vs. MDL –MML can be derived from Bayes’ Theorem (Wallace) –MDL is a non-Bayesian method Search: MCMC sampling through TOM space –TOM = DAG + total ordering –TOM is finer than DAG A B C Two TOMs: ABC, ACB ABC One Tom: ABC

Priors in CaMML: arcs Experts may provide priors on pairwise relations: 1. Directed arcs: –e.g. {A→B 0.7} (soft) –e.g. {A→D 1.0} (hard) 2. Undirected arcs –E.g. {A─C 0.6} (soft) 3.{A→B 0.7; B→A 0.8; A─C 0.6} –Represented by 2 adjacency matrices Directed arcsUndirected arcs

Priors in CaMML: arcs (continued) MML cost for each pair AB: log(0.7) + log(1-0.8) AC: log(1-0.6) BC: log( default arc prior) expert specified network A B C One candidate network A B C

Priors in CaMML: Tiers Expert can provide prior on an additional pairwise relation Tier: Temporal ordering of variables E.g., Tier {A>>C 0.6;B>>C 0.8} I MML (h)=log(0.6)+log(1-0.8) A C B One possible TOM

Priors in CaMML: edPrior Expert specifies single network, plus a confidence –e.g. EdConf=0.7 Prior is based on edit distance from this network A B Expert specified network C I MML (h)=-2*(log0.7-log(1-0.7)) One candidate network :ED=2 A B C

Priors in CaMML: KTPrior Again, expert specifies single network, plus a confidence –e.g. KTConf = 0.7 Prior is based on Kendall-Tau Edit distance from this network –KTEditDist = KT + undirected ED ABC Expert specified dag TOM: ABC A B C A candidate TOM: ACB I MML (h)=-3*(log0.7-log(1-0.7)) B-C order in expert TOM disagrees with candidate TOM KTEditDist = KT(1) + Undirected ED (2) = 3

Experiment 1: Design Prior –weak, strong –correct, incorrect Size of dataset –100,1000,10k and 100k –For each size we randomly generate 30 datasets Algorithms: –CaMML –K2 (BNT) –K2+MWST (BNT) –GES (TETRAD) –PC (TETRAD) Models: AsiaNet, “Model6”(An artificial model)

Models: AsiaNet and “Model6”

Experimental Design Priors Algorithms Sample Size

Experiment Design: Evaluation ED: Difference between Structures KL: Difference between distributions

Model6 (1000 samples)

Model6 (10k samples)

AsiaNet (1000 Samples)

Experiment 1: Results With default priors: CaMML is comparable to or outperforms other algorithms With full tiers: –There is no statistically significant differences between CaMML and K2 –GES is slightly behind, PC performs poorly. CaMML is the only method allowing soft priors: –with the prior 0.7, CaMML is comparable to other algorithms with full tiers –With stronger prior, CaMML performs better CaMML performs significantly better with expert’s priors than with uniform priors

Expertiment 2: Is CaMML well calibrated? Biased prior –Expert’s confidence may not be consistent with the expert’s skill e.g, expert 0.99 sure but wrong about a connection –Biased hard prior –Soft prior and data will eventually overcome the bad prior

Is CaMML well calibrated? Question: Does CaMML reward well calibrated experts? Experimental design –Objective measure: How good is a proposed structure?: ED: 0-14 –Subjective measure: Expert’s confidence 0.5 to –How good is the learned structure? KL distance

Effect of expert skill and confidence on quality of learned model Better ← Expert Skill → Worse Overconfidence penalized Justified confidence rewarded Unconfident expert

Experiment 2: Results CaMML improves the elicited structure and approaches the true structure CaMML improves when the expert confidence matches with the expert skill

Conclusions CaMML is comparable to other algorithms when given equivalent prior knowledge CaMML can incorporate more flexible prior knowledge CaMML’s results improve when expert is skillful or well calibrated

Thanks