Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Simon Fraser University Vancouver, Canada ` with Wei.

Slides:



Advertisements
Similar presentations
Big Ideas in Cmput366. Search Blind Search State space representation Iterative deepening Heuristic Search A*, f(n)=g(n)+h(n), admissible heuristics Local.
Advertisements

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
Local Search Jim Little UBC CS 322 – CSP October 3, 2014 Textbook §4.8
Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
Weakening the Causal Faithfulness Assumption
Outline 1)Motivation 2)Representing/Modeling Causal Systems 3)Estimation and Updating 4)Model Search 5)Linear Latent Variable Models 6)Case Study: fMRI.
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
. The sample complexity of learning Bayesian Networks Or Zuk*^, Shiri Margel* and Eytan Domany* *Dept. of Physics of Complex Systems Weizmann Inst. of.
. On the Number of Samples Needed to Learn the Correct Structure of a Bayesian Network Or Zuk, Shiri Margel and Eytan Domany Dept. of Physics of Complex.
The IMAP Hybrid Method for Learning Gaussian Bayes Nets Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada
From: Probabilistic Methods for Bioinformatics - With an Introduction to Bayesian Networks By: Rich Neapolitan.
Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.
Junction Trees: Motivation Standard algorithms (e.g., variable elimination) are inefficient if the undirected graph underlying the Bayes Net contains cycles.
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Bayesian Network Representation Continued
1 gR2002 Peter Spirtes Carnegie Mellon University.
Today Logistic Regression Decision Trees Redux Graphical Models
Bayesian Networks Alan Ritter.
. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.
Causal Models, Learning Algorithms and their Application to Performance Modeling Jan Lemeire Parallel Systems lab November 15 th 2006.
Bayes Net Perspectives on Causation and Causal Inference
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Made by: Maor Levy, Temple University  Probability expresses uncertainty.  Pervasive in all of Artificial Intelligence  Machine learning 
Simulation Output Analysis
A Brief Introduction to Graphical Models
Statistics: Unlocking the Power of Data Lock 5 Synthesis STAT 250 Dr. Kari Lock Morgan SECTIONS 4.4, 4.5 Connecting bootstrapping and randomization (4.4)
Adaptive CSMA under the SINR Model: Fast convergence using the Bethe Approximation Krishna Jagannathan IIT Madras (Joint work with) Peruru Subrahmanya.
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
Bayesian Networks Martin Bachler MLA - VO
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.
An Asymptotic Analysis of Generative, Discriminative, and Pseudolikelihood Estimators by Percy Liang and Michael Jordan (ICML 2008 ) Presented by Lihan.
Selected Topics in Graphical Models Petr Šimeček.
Mind Change Optimal Learning Of Bayes Net Structure Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.
1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, –  Carlos.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,
Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.
Markov Random Fields in Vision
1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
1 Day 2: Search June 14, 2016 Carnegie Mellon University Center for Causal Discovery.
1 BN Semantics 3 – Now it’s personal! Parameter Learning 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 Readings:
CS 2750: Machine Learning Directed Graphical Models
What is the next line of the proof?
Artificial Intelligence
Markov Properties of Directed Acyclic Graphs
CAP 5636 – Advanced Artificial Intelligence
EC 331 The Theory of and applications of Maximum Likelihood Method
CS 188: Artificial Intelligence
Markov Random Fields Presented by: Vladan Radosavljevic.
BN Semantics 3 – Now it’s personal! Parameter Learning 1
Information Theoretical Analysis of Digital Watermarking
Presentation transcript:

Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Simon Fraser University Vancouver, Canada ` with Wei Luo (SFU) and Russ Greiner (U of Alberta)

Learning Bayes Nets Based on Conditional Dependencies 2/20 Outline 1. Brief Intro to Bayes Nets 2. Combining Dependency Information with Model Selection 3. System Architecture 4. Theoretical Analysis and Simulation Results

Learning Bayes Nets Based on Conditional Dependencies 3/20 Bayes Nets: Overview Bayes Net Structure = Directed Acyclic Graph. Nodes = Variables of Interest. Arcs = direct “influence”, “association”. Parameters = CP Tables = Prob of Child given Parents. Structure represents (in)dependencies. Structure + parameters represents joint probability distribution over variables.

Learning Bayes Nets Based on Conditional Dependencies 4/20 Examples from CIspace (UBC)

Learning Bayes Nets Based on Conditional Dependencies 5/20 Graphs entail Dependencies A B C A B C A B C Dep(A,B),Dep(A,B|C) Dep(A,B),Dep(A,B|C), Dep(B,C),Dep(B,C|A), Dep(A,C|B)

Learning Bayes Nets Based on Conditional Dependencies 6/20 I-maps and Probability Distributions Defn Graph G is an I-map of prob dist P  If Dependent(X,Y|S) in P, then X is d-connected to Y given S in G. Example: If Dependent(Father Eye Color,Mother Eye Color|Child Eye Color) in P, then Father EC is d-connected to Mother EC given Child EC in G. G is an I-map of P  G entails all conditional dependencies in P. Theorem Fix G,P. There is a parameter setting  for G such that (G,  ) represents P  G is an I-map of P.

Learning Bayes Nets Based on Conditional Dependencies 7/20 Two Approaches to Learning Bayes Net Structure select graph G as “model” with parameters to be estimated “search and score” find G that represents (in)dependencies in P test for dependencies, cover Aim: find G that represents P with suitable parameters

Learning Bayes Nets Based on Conditional Dependencies 8/20 Our Hybrid Approach Sample Set of (In)Dependencies Final Output Graph The final selected graph maximizes a model selection score and covers all observed (in)dependencies.

Definition of Hybrid Criterion Let d be a sample. Let S(G,d) be a score function. A B C Case 1Case 2Case 3 S 10.5 Let Dep be a set of conditional dependencies extracted from sample d. Graph G optimizes score S given Dep, sample d  G entails the dependencies Dep, and 1. if any other graph G’ entails Dep, then score(G,d) ≥ score(G’,d).

Learning Bayes Nets Based on Conditional Dependencies 10/20 Local Search Heuristics for Constrained Search There is a general method for adapting any local search heuristic to accommodate observed dependencies. Will present adaptation of GES search - call it IGES.

Learning Bayes Nets Based on Conditional Dependencies 11/20 GES Search (Meek, Chickering) B C A B C A B C A B C A B C A Grow Phase: Add Edges Shrink Phase: Delete Edges Score = 5 Score = 7 Score = 8 Score = 8.5 Score = 9

Learning Bayes Nets Based on Conditional Dependencies 12/20 IGES Search Case 1Case 2Case 3 Step 1: Extract Dependencies From Sample Testing Procedure Dependencies 1.Continue with Growth Phase until all dependencies are covered. 2.During Shrink Phase, delete edge only if dependencies are still covered. B C A Score = 7 B C A Score = 5 given Dep(A,B)

Asymptotic Equivalence GES = IGES Theorem Assume that score function S is consistent and that joint probability distribution P satisfies the composition principle. Let Dep be a set of dependencies true of P. Then with P-probability 1, GES and IGES+Dep converge to the same output in the sample size limit. So IGES inherits the convergence properties of GES.

Learning Bayes Nets Based on Conditional Dependencies 14/20 Extracting Dependencies We use  2 test (with cell coverage condition) Exhaustive testing of all triples Indep(X,Y|S) for cardinality(S) < k chosen by user  More sophisticated testing strategy coming soon.

Learning Bayes Nets Based on Conditional Dependencies 15/20 Simulation Setup: Methods The hybrid approach is a general schema. Our Setup Statistical Test:  2, sign. 5% Score S: Bdeu (with Tetrad default settings) Search Method: GES, adapted

Simulation Setup: Graphs and Data Random DAGs with binary variables. #Nodes: 4,6,8,10. Sample Sizes 100, 200, 400, 800, 1600, 3200, 6400, 12800, random samples per graph per sample size, average results. Graphs generated with Tetrad’s random DAG utility.

Show Some Graphs

Conclusion for I-map learning: The Underfitting Zone Although not explicitly designed to cover statistically significant correlations, GES+BDeu does so pretty well. But not perfectly, so IGES helps to add in missing edges (on the order of 5) for node 10 graphs. sample size small: little significance medium: underfitting of correlations large: convergence zone Diver- gence from True Graph standard search + score constrained S + S

Learning Bayes Nets Based on Conditional Dependencies 19/20 Future Work: More Efficient Testing Strategy Say that a graph G satisfies the Markov condition wrt sample d  for all X, Y, if Y is nonparental nondescendant of X, then we do not find Dep(X,Y|parents(X). Given sample d, look for graph G that maximizes score and satisfies the MC wrt d. Requires only (#Var) 2 tests.

Learning Bayes Nets Based on Conditional Dependencies 20/20 Summary: Hybrid Criterion - test, search and score. Basic Idea: Base Bayes net learning on dependencies that can be reliably obtained even on small to medium sample sizes. Hybrid criterion: find graph that maximizes model selection score given the constraint of entailing statistically significant dependencies or correlations. Theory + Simulation evidence suggests that this: speeds up convergence to correct graph addresses underfitting on small-medium samples. THE END