Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson
Outline Methods for learning causal models –Data mining, Elicitation, Hybrid approach Algorithms for learning causal models –Constraint based –Metric based (including our CaMML) Incorporating priors into CaMML – 5 different types of priors Experimental Design Experimental Results
Learning Causal Bayesian Networks ElicitationData mining Requires domain knowledgeRequires large dataset Expensive and time- consuming Sometimes, the algorithms are “stupid” (no prior knowledge →no common sense) Partial knowledge may be insufficient Data only tells part of the story
A hybrid approach Combine the domain knowledge and the facts learned from data Minimize the expert’s effort in domain knowledge elicitation Elicitation Data Mining Causal BN Enhance the efficiency of the learning process –Reduce / bias the search space
Objectives Generate different prior specification methods Comparatively study the influences of priors on the BN structural learning Future: apply the methods to the Heart Disease modeling project
Causal learning algorithms Constraint based –Pearl & Verma’s algorithm, PC Metric based –MML, MDL, BIC, BDe, K2, K2+MWST,GES,CaMML Priors on structure –Optional vs. Required –Hard vs. Soft
Priors on structure RequiredOptionalHardSoft K2 (BNT) yes K2+MWST (BNT) yes GES (Tetrad) yes PC (Tetrad ) yes CaMML yes
CaMML MML metric based MML vs. MDL –MML can be derived from Bayes’ Theorem (Wallace) –MDL is a non-Bayesian method Search: MCMC sampling through TOM space –TOM = DAG + total ordering –TOM is finer than DAG A B C Two TOMs: ABC, ACB ABC One Tom: ABC
Priors in CaMML: arcs Experts may provide priors on pairwise relations: 1. Directed arcs: –e.g. {A→B 0.7} (soft) –e.g. {A→D 1.0} (hard) 2. Undirected arcs –E.g. {A─C 0.6} (soft) 3.{A→B 0.7; B→A 0.8; A─C 0.6} –Represented by 2 adjacency matrices Directed arcsUndirected arcs
Priors in CaMML: arcs (continued) MML cost for each pair AB: log(0.7) + log(1-0.8) AC: log(1-0.6) BC: log( default arc prior) expert specified network A B C One candidate network A B C
Priors in CaMML: Tiers Expert can provide prior on an additional pairwise relation Tier: Temporal ordering of variables E.g., Tier {A>>C 0.6;B>>C 0.8} I MML (h)=log(0.6)+log(1-0.8) A C B One possible TOM
Priors in CaMML: edPrior Expert specifies single network, plus a confidence –e.g. EdConf=0.7 Prior is based on edit distance from this network A B Expert specified network C I MML (h)=-2*(log0.7-log(1-0.7)) One candidate network :ED=2 A B C
Priors in CaMML: KTPrior Again, expert specifies single network, plus a confidence –e.g. KTConf = 0.7 Prior is based on Kendall-Tau Edit distance from this network –KTEditDist = KT + undirected ED ABC Expert specified dag TOM: ABC A B C A candidate TOM: ACB I MML (h)=-3*(log0.7-log(1-0.7)) B-C order in expert TOM disagrees with candidate TOM KTEditDist = KT(1) + Undirected ED (2) = 3
Experiment 1: Design Prior –weak, strong –correct, incorrect Size of dataset –100,1000,10k and 100k –For each size we randomly generate 30 datasets Algorithms: –CaMML –K2 (BNT) –K2+MWST (BNT) –GES (TETRAD) –PC (TETRAD) Models: AsiaNet, “Model6”(An artificial model)
Models: AsiaNet and “Model6”
Experimental Design Priors Algorithms Sample Size
Experiment Design: Evaluation ED: Difference between Structures KL: Difference between distributions
Model6 (1000 samples)
Model6 (10k samples)
AsiaNet (1000 Samples)
Experiment 1: Results With default priors: CaMML is comparable to or outperforms other algorithms With full tiers: –There is no statistically significant differences between CaMML and K2 –GES is slightly behind, PC performs poorly. CaMML is the only method allowing soft priors: –with the prior 0.7, CaMML is comparable to other algorithms with full tiers –With stronger prior, CaMML performs better CaMML performs significantly better with expert’s priors than with uniform priors
Expertiment 2: Is CaMML well calibrated? Biased prior –Expert’s confidence may not be consistent with the expert’s skill e.g, expert 0.99 sure but wrong about a connection –Biased hard prior –Soft prior and data will eventually overcome the bad prior
Is CaMML well calibrated? Question: Does CaMML reward well calibrated experts? Experimental design –Objective measure: How good is a proposed structure?: ED: 0-14 –Subjective measure: Expert’s confidence 0.5 to –How good is the learned structure? KL distance
Effect of expert skill and confidence on quality of learned model Better ← Expert Skill → Worse Overconfidence penalized Justified confidence rewarded Unconfident expert
Experiment 2: Results CaMML improves the elicited structure and approaches the true structure CaMML improves when the expert confidence matches with the expert skill
Conclusions CaMML is comparable to other algorithms when given equivalent prior knowledge CaMML can incorporate more flexible prior knowledge CaMML’s results improve when expert is skillful or well calibrated
Thanks