Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver, Canada ` with Wei Luo (Simon Fraser) and Russ Greiner (U of Alberta)
Learning Bayes Nets Based on Conditional Dependencies 2/28 Outline 1. Brief Intro to Bayes Nets 2. Combining Dependency Information with Model Selection 3. Learning from Dependency Data Only: Learning-Theoretic Analysis
Learning Bayes Nets Based on Conditional Dependencies 3/28 Bayes Nets: Overview Bayes Net Structure = Directed Acyclic Graph. Nodes = Variables of Interest. Arcs = direct “influence”, “association”. Parameters = CP Tables = Prob of Child given Parents. Structure represents (in)dependencies. Structure + parameters represents joint probability distribution over variables.
Learning Bayes Nets Based on Conditional Dependencies 4/28 Examples from CIspace (UBC)
Learning Bayes Nets Based on Conditional Dependencies 5/28 Graphs entail Dependencies A B C A B C A B C Dep(A,B),Dep(A,B|C) Dep(A,B),Dep(A,B|C), Dep(B,C),Dep(B,C|A), Dep(A,C|B)
Learning Bayes Nets Based on Conditional Dependencies 6/28 I-maps and Probability Distributions Defn Graph G is an I-map of prob dist P If Dependent(X,Y|S) in P, then X is d-connected to Y given S in G. Example: If Dependent(Father Eye Color,Mother Eye Color|Child Eye Color) in P, then Father EC is d-connected to Mother EC given Child EC in G. Informally, G is an I-map of P G entails all conditional dependencies in P. Theorem Fix G,P. There is a parameter setting for G such that (G, ) represents P G is an I-map of P.
Two Approaches to Learning Bayes Net Structure select graph G as “model” with parameters to be estimated “search and score” find G that represents dependencies in P “test and cover” dependencies Aim: find G that represents P with suitable parameters
Learning Bayes Nets Based on Conditional Dependencies 8/28 Our Hybrid Approach Sample Set of Dependencies Final Output Graph The final selected graph maximizes a model selection score and covers all observed dependencies.
Definition of Hybrid Criterion Let d be a sample. Let S(G,d) be a score function. A B C Case 1Case 2Case 3 S 10.5 Let Dep be a set of conditional dependencies extracted from sample d. Graph G optimizes score S given Dep, sample d G entails the dependencies Dep, and 1. if any other graph G’ entails Dep, then score(G,d) ≥ score(G’,d).
Learning Bayes Nets Based on Conditional Dependencies 10/28 Local Search Heuristics for Constrained Search There is a general method for adapting any local search heuristic to accommodate observed dependencies. Will present adaptation of GES search - call it IGES.
Learning Bayes Nets Based on Conditional Dependencies 11/28 GES Search (Meek, Chickering) Growth Phase: Add Edges B C A Score = 5 B C A Score = 7 B C A Score = 8.5 Shrink Phase: Delete Edges B C A Score = 9 B C A Score = 8
Learning Bayes Nets Based on Conditional Dependencies 12/28 IGES Search Case 1Case 2Case 3 Step 1: Extract Dependencies From Sample Testing Procedure Dependencies 1.Continue with Growth Phase until all dependencies are covered. 2.During Shrink Phase, delete edge only if dependencies are still covered. B C A Score = 7 B C A Score = 5 given Dep(A,B)
Asymptotic Equivalence GES = IGES Theorem Assume that score function S is consistent and that joint probability distribution P satisfies the composition principle. Let Dep be a set of dependencies true of P. Then with P-probability 1, GES and IGES+Dep converge to the same output in the sample size limit. So IGES inherits the convergence properties of GES.
Learning Bayes Nets Based on Conditional Dependencies 14/28 Extracting Dependencies We use 2 test (with cell coverage condition) Exhaustive testing of all triples Indep(X,Y|S) for cardinality(S) < k chosen by user More sophisticated testing strategy coming soon.
Learning Bayes Nets Based on Conditional Dependencies 15/28 Simulation Setup: Methods The hybrid approach is a general schema. Our Setup Statistical Test: 2 Score S: BDeu (with Tetrad default settings) Search Method: GES, adapted
Simulation Setup: Graphs and Data Random DAGs with binary variables. #Nodes: 4,6,8,10. Sample Sizes 100, 200, 400, 800, 1600, 3200, 6400, 12800, random samples per graph per sample size, average results. Graphs generated with Tetrad’s random DAG utility.
Result Graphs
Conclusion for I-map learning: The Underfitting Zone Although not explicitly designed to cover statistically significant correlations, GES+BDeu does so pretty well. But not perfectly, so IGES helps to add in missing edges (on the order of 5) for node 10 graphs. sample size small: little significance medium: underfitting of correlations large: convergence zone Diver- gence from True Graph standard search + score constrained S + S
Learning Bayes Nets Based on Conditional Dependencies 19/28 Part II: Learning-Theoretic Model (COLT 2007) Learning Model: Learner receives increasing enumeration (list) of conditional dependency statements. Data repetition is possible. Learner outputs graph (pattern); may output ?. Dep(A,B)Dep(B,C)Dep(A,C|B) B C A B C A ? … … Data Conjectures
Learning Bayes Nets Based on Conditional Dependencies 20/28 Criteria for Optimal Learning 1. Convergence: Learner must eventually settle on true graph. 2. Learner must minimize mind changes. 3. Given 1 and 2, learner is not dominated in convergence time.
Learning Bayes Nets Based on Conditional Dependencies 21/28 The Optimal Learning Procedure Theorem There is a unique optimal learner defined as follows: 1. If there is a unique graph G covering the observed dependencies with a minimum number of adjacencies, output G. 2. Otherwise output ?.
Learning Bayes Nets Based on Conditional Dependencies 22/28 Computational Complexity of the Unique Optimal Learner Theorem The following problem is NP-hard: 1.Decide if there is a unique edge-minimal map for a set of dependencies D. 2.If yes, output the graph. Proof: Reduction to Unique Exact 3Set Cover. {x1,x2,x3},{x3,x4,x5},{x4,x5,x7},{x2,x4,x5}, {x3,x6,x9}, {x6,x8,x9} x1 x2 x3 x4 x5 x6 x7 x8 x9 {x1,x2,x3},{x4,x5,x7},{x3,x6,x9}
Learning Bayes Nets Based on Conditional Dependencies 23/28 Hybrid Method and Optimal Learner Score-based methods tend to underfit (with discrete variables): place edges correctly but too few mind change optimal but not convergence time optimal. Hybrid method speeds up convergence.
Learning Bayes Nets Based on Conditional Dependencies 24/28 A New Testing Strategy Say that a graph G satisfies the Markov condition wrt sample d for all X, Y, if Y is nonparental nondescendant of X, then we do not find Dep(X,Y|parents(X)). Given sample d, look for graph G that satisfies the MC wrt d with a minimum number of adjacencies.
Learning Bayes Nets Based on Conditional Dependencies 25/28 Future Work Use Markov condition to develop local search algorithm for score optimization requiring only (#Var) 2 tests. Apply idea of Markov condition +edge minimization for continuous variable models.
Learning Bayes Nets Based on Conditional Dependencies 26/28 Summary: Hybrid Criterion - test, search and score. Basic Idea: Base Bayes net learning on dependencies that can be reliably obtained even on small to medium sample sizes. Hybrid criterion: find graph that maximizes model selection score given the constraint of entailing statistically significant dependencies or correlations. Theory + Simulation evidence suggests that this: speeds up convergence to correct graph addresses underfitting on small-medium samples.
Learning Bayes Nets Based on Conditional Dependencies 27/28 Summary: Learning-Theoretic Analysis Learning Model: Learn graph from dependencies alone. Optimal Method: look for graph that covers observed dependencies with a minimum number of adjacencies. Implementing this method is NP-hard.
Learning Bayes Nets Based on Conditional Dependencies 28/28 References “Mind Change Optimal Learning of Bayes Net Structure”. O. Schulte, W. Luo and R. Greiner (2007). Conference of Learning Theory (COLT). THE END