Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,

Slides:

Advertisements

Similar presentations

Local Search Jim Little UBC CS 322 – CSP October 3, 2014 Textbook §4.8

Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.

Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.

MBD and CSP Meir Kalech Partially based on slides of Jia You and Brian Williams.

Weakening the Causal Faithfulness Assumption

Outline 1)Motivation 2)Representing/Modeling Causal Systems 3)Estimation and Updating 4)Model Search 5)Linear Latent Variable Models 6)Case Study: fMRI.

Lauritzen-Spiegelhalter Algorithm

Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Simon Fraser University Vancouver, Canada ` with Wei.

Computing Kemeny and Slater Rankings Vincent Conitzer (Joint work with Andrew Davenport and Jayant Kalagnanam at IBM Research.)

. The sample complexity of learning Bayesian Networks Or Zuk*^, Shiri Margel* and Eytan Domany* *Dept. of Physics of Complex Systems Weizmann Inst. of.

. On the Number of Samples Needed to Learn the Correct Structure of a Bayesian Network Or Zuk, Shiri Margel and Eytan Domany Dept. of Physics of Complex.

The IMAP Hybrid Method for Learning Gaussian Bayes Nets Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada

Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.

Hidden Markov Models Theory By Johan Walters (SR 2003)

From: Probabilistic Methods for Bioinformatics - With an Introduction to Bayesian Networks By: Rich Neapolitan.

Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.

. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:

Mutual Information Mathematical Biology Seminar

Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.

PGM 2003/04 Tirgul 3-4 The Bayesian Network Representation.

Bayesian Network Representation Continued

Parametric Inference.

1 gR2002 Peter Spirtes Carnegie Mellon University.

. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.

Causal Models, Learning Algorithms and their Application to Performance Modeling Jan Lemeire Parallel Systems lab November 15 th 2006.

Ryan Kinworthy 2/26/20031 Chapter 7- Local Search part 2 Ryan Kinworthy CSCE Advanced Constraint Processing.

1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.

Bayes Net Perspectives on Causation and Causal Inference

On comparison of different approaches to the stability radius calculation Olga Karelkina Department of Mathematics University of Turku MCDM 2011.

Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?

Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.

Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for

Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.

Mind Change Optimal Learning Of Bayes Net Structure Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada

PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.

Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.

Learning With Bayesian Networks Markus Kalisch ETH Zürich.

INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.

1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, –  Carlos.

Dependency Networks for Collaborative Filtering and Data Visualization UAI-2000 발표 : 황규백.

The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

1 BN Semantics 2 – The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 20 th, 2006 Readings: K&F:

Adaptive Dependent Context BGL: Budgeted Generative (-) Learning Given nothing about training instances, pay for any feature [no “labels”, no “attributes”

Inference Algorithms for Bayes Networks

Unique Games Approximation Amit Weinstein Complexity Seminar, Fall 2006 Based on: “Near Optimal Algorithms for Unique Games" by M. Charikar, K. Makarychev,

Designing Factorial Experiments with Binary Response Tel-Aviv University Faculty of Exact Sciences Department of Statistics and Operations Research Hovav.

1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.

1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.

Chapter 6 Large Random Samples Weiqi Luo ( 骆伟祺 ) School of Data & Computer Science Sun Yat-Sen University ：

1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:

Markov Random Fields in Vision

1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.

The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.

CSC Lecture 23: Sigmoid Belief Nets and the wake-sleep algorithm Geoffrey Hinton.

Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.

1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.

Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.

1 BN Semantics 3 – Now it’s personal! Parameter Learning 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 Readings:

What is the next line of the proof?

Computer Science cpsc322, Lecture 14

Markov Properties of Directed Acyclic Graphs

Discrete Event Simulation - 4

An Algorithm for Bayesian Network Construction from Data

BN Semantics 3 – Now it’s personal! Parameter Learning 1

Information Theoretical Analysis of Digital Watermarking

Presentation transcript:

Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver, Canada ` with Wei Luo (Simon Fraser) and Russ Greiner (U of Alberta)

Learning Bayes Nets Based on Conditional Dependencies 2/28 Outline 1. Brief Intro to Bayes Nets 2. Combining Dependency Information with Model Selection 3. Learning from Dependency Data Only: Learning-Theoretic Analysis

Learning Bayes Nets Based on Conditional Dependencies 3/28 Bayes Nets: Overview Bayes Net Structure = Directed Acyclic Graph. Nodes = Variables of Interest. Arcs = direct “influence”, “association”. Parameters = CP Tables = Prob of Child given Parents. Structure represents (in)dependencies. Structure + parameters represents joint probability distribution over variables.

Learning Bayes Nets Based on Conditional Dependencies 4/28 Examples from CIspace (UBC)

Learning Bayes Nets Based on Conditional Dependencies 5/28 Graphs entail Dependencies A B C A B C A B C Dep(A,B),Dep(A,B|C) Dep(A,B),Dep(A,B|C), Dep(B,C),Dep(B,C|A), Dep(A,C|B)

Learning Bayes Nets Based on Conditional Dependencies 6/28 I-maps and Probability Distributions Defn Graph G is an I-map of prob dist P  If Dependent(X,Y|S) in P, then X is d-connected to Y given S in G. Example: If Dependent(Father Eye Color,Mother Eye Color|Child Eye Color) in P, then Father EC is d-connected to Mother EC given Child EC in G. Informally, G is an I-map of P  G entails all conditional dependencies in P. Theorem Fix G,P. There is a parameter setting  for G such that (G,  ) represents P  G is an I-map of P.

Two Approaches to Learning Bayes Net Structure select graph G as “model” with parameters to be estimated “search and score” find G that represents dependencies in P “test and cover” dependencies Aim: find G that represents P with suitable parameters

Learning Bayes Nets Based on Conditional Dependencies 8/28 Our Hybrid Approach Sample Set of Dependencies Final Output Graph The final selected graph maximizes a model selection score and covers all observed dependencies.

Definition of Hybrid Criterion Let d be a sample. Let S(G,d) be a score function. A B C Case 1Case 2Case 3 S 10.5 Let Dep be a set of conditional dependencies extracted from sample d. Graph G optimizes score S given Dep, sample d  G entails the dependencies Dep, and 1. if any other graph G’ entails Dep, then score(G,d) ≥ score(G’,d).

Learning Bayes Nets Based on Conditional Dependencies 10/28 Local Search Heuristics for Constrained Search There is a general method for adapting any local search heuristic to accommodate observed dependencies. Will present adaptation of GES search - call it IGES.

Learning Bayes Nets Based on Conditional Dependencies 11/28 GES Search (Meek, Chickering) Growth Phase: Add Edges B C A Score = 5 B C A Score = 7 B C A Score = 8.5 Shrink Phase: Delete Edges B C A Score = 9 B C A Score = 8

Learning Bayes Nets Based on Conditional Dependencies 12/28 IGES Search Case 1Case 2Case 3 Step 1: Extract Dependencies From Sample Testing Procedure Dependencies 1.Continue with Growth Phase until all dependencies are covered. 2.During Shrink Phase, delete edge only if dependencies are still covered. B C A Score = 7 B C A Score = 5 given Dep(A,B)

Asymptotic Equivalence GES = IGES Theorem Assume that score function S is consistent and that joint probability distribution P satisfies the composition principle. Let Dep be a set of dependencies true of P. Then with P-probability 1, GES and IGES+Dep converge to the same output in the sample size limit. So IGES inherits the convergence properties of GES.

Learning Bayes Nets Based on Conditional Dependencies 14/28 Extracting Dependencies We use  2 test (with cell coverage condition) Exhaustive testing of all triples Indep(X,Y|S) for cardinality(S) < k chosen by user  More sophisticated testing strategy coming soon.

Learning Bayes Nets Based on Conditional Dependencies 15/28 Simulation Setup: Methods The hybrid approach is a general schema. Our Setup Statistical Test:  2 Score S: BDeu (with Tetrad default settings) Search Method: GES, adapted

Simulation Setup: Graphs and Data Random DAGs with binary variables. #Nodes: 4,6,8,10. Sample Sizes 100, 200, 400, 800, 1600, 3200, 6400, 12800, random samples per graph per sample size, average results. Graphs generated with Tetrad’s random DAG utility.

Result Graphs

Conclusion for I-map learning: The Underfitting Zone Although not explicitly designed to cover statistically significant correlations, GES+BDeu does so pretty well. But not perfectly, so IGES helps to add in missing edges (on the order of 5) for node 10 graphs. sample size small: little significance medium: underfitting of correlations large: convergence zone Diver- gence from True Graph standard search + score constrained S + S

Learning Bayes Nets Based on Conditional Dependencies 19/28 Part II: Learning-Theoretic Model (COLT 2007) Learning Model: Learner receives increasing enumeration (list) of conditional dependency statements. Data repetition is possible. Learner outputs graph (pattern); may output ?. Dep(A,B)Dep(B,C)Dep(A,C|B) B C A B C A ? … … Data Conjectures

Learning Bayes Nets Based on Conditional Dependencies 20/28 Criteria for Optimal Learning 1. Convergence: Learner must eventually settle on true graph. 2. Learner must minimize mind changes. 3. Given 1 and 2, learner is not dominated in convergence time.

Learning Bayes Nets Based on Conditional Dependencies 21/28 The Optimal Learning Procedure Theorem There is a unique optimal learner defined as follows: 1. If there is a unique graph G covering the observed dependencies with a minimum number of adjacencies, output G. 2. Otherwise output ?.

Learning Bayes Nets Based on Conditional Dependencies 22/28 Computational Complexity of the Unique Optimal Learner Theorem The following problem is NP-hard: 1.Decide if there is a unique edge-minimal map for a set of dependencies D. 2.If yes, output the graph. Proof: Reduction to Unique Exact 3Set Cover. {x1,x2,x3},{x3,x4,x5},{x4,x5,x7},{x2,x4,x5}, {x3,x6,x9}, {x6,x8,x9} x1 x2 x3 x4 x5 x6 x7 x8 x9 {x1,x2,x3},{x4,x5,x7},{x3,x6,x9}

Learning Bayes Nets Based on Conditional Dependencies 23/28 Hybrid Method and Optimal Learner Score-based methods tend to underfit (with discrete variables): place edges correctly but too few mind change optimal but not convergence time optimal. Hybrid method speeds up convergence.

Learning Bayes Nets Based on Conditional Dependencies 24/28 A New Testing Strategy Say that a graph G satisfies the Markov condition wrt sample d  for all X, Y, if Y is nonparental nondescendant of X, then we do not find Dep(X,Y|parents(X)). Given sample d, look for graph G that satisfies the MC wrt d with a minimum number of adjacencies.

Learning Bayes Nets Based on Conditional Dependencies 25/28 Future Work Use Markov condition to develop local search algorithm for score optimization requiring only (#Var) 2 tests. Apply idea of Markov condition +edge minimization for continuous variable models.

Learning Bayes Nets Based on Conditional Dependencies 26/28 Summary: Hybrid Criterion - test, search and score. Basic Idea: Base Bayes net learning on dependencies that can be reliably obtained even on small to medium sample sizes. Hybrid criterion: find graph that maximizes model selection score given the constraint of entailing statistically significant dependencies or correlations. Theory + Simulation evidence suggests that this: speeds up convergence to correct graph addresses underfitting on small-medium samples.

Learning Bayes Nets Based on Conditional Dependencies 27/28 Summary: Learning-Theoretic Analysis Learning Model: Learn graph from dependencies alone. Optimal Method: look for graph that covers observed dependencies with a minimum number of adjacencies. Implementing this method is NP-hard.

Learning Bayes Nets Based on Conditional Dependencies 28/28 References “Mind Change Optimal Learning of Bayes Net Structure”. O. Schulte, W. Luo and R. Greiner (2007). Conference of Learning Theory (COLT). THE END