An Algorithm for Bayesian Network Construction from Data

Slides:

Advertisements

Similar presentations

A Tutorial on Learning with Bayesian Networks

Advertisements

Probabilistic Reasoning Bayesian Belief Networks Constructing Bayesian Networks Representing Conditional Distributions Summary.

BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.

Graduate Center/City University of New York University of Helsinki FINDING OPTIMAL BAYESIAN NETWORK STRUCTURES WITH CONSTRAINTS LEARNED FROM DATA Xiannian.

Nir Friedman, Iftach Nachman, and Dana Peer Announcer: Kyu-Baek Hwang

Bayesian Networks A causal probabilistic network, or Bayesian network,

Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.

Bayesian Network Representation Continued

Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.

. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.

Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.

Made by: Maor Levy, Temple University  Probability expresses uncertainty.  Pervasive in all of Artificial Intelligence  Machine learning 

Summary of the Bayes Net Formalism David Danks Institute for Human & Machine Cognition.

Bayesian networks Chapter 14 Section 1 – 2. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.

Introduction to Bayesian Networks

Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.

Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.

Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,

Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.

Learning With Bayesian Networks Markus Kalisch ETH Zürich.

Announcements Project 4: Ghostbusters Homework 7

INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.

Slides for “Data Mining” by I. H. Witten and E. Frank.

K2 Algorithm Presentation KDD Lab, CIS Department, KSU

The Visual Causality Analyst: An Interactive Interface for Causal Reasoning Jun Wang, Stony Brook University Klaus Mueller, Stony Brook University, SUNY.

1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,

Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.

04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.

Introduction on Graphic Models

1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.

Today Graphical Models Representing conditional dependence graphically

A Cooperative Coevolutionary Genetic Algorithm for Learning Bayesian Network Structures Arthur Carvalho

Belief Networks Kostas Kontogiannis E&CE 457. Belief Networks A belief network is a graph in which the following holds: –A set of random variables makes.

Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.

An Algorithm to Learn the Structure of a Bayesian Network Çiğdem Gündüz Olcay Taner Yıldız Ethem Alpaydın Computer Engineering Taner Bilgiç Industrial.

1 BN Semantics 3 – Now it’s personal! Parameter Learning 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 Readings:

Cohesive Subgraph Computation over Large Graphs

CS 2750: Machine Learning Review

CS 2750: Machine Learning Directed Graphical Models

Bayesian networks Chapter 14 Section 1 – 2.

Presented By S.Yamuna AP/CSE

Learning Tree Structures

Qian Liu CSE spring University of Pennsylvania

Inference in Bayesian Networks

Read R&N Ch Next lecture: Read R&N

Maximal Independent Set

Learning Bayesian Network Models from Data

Learning Markov Blankets

Discrete Mathematics for Computer Science

Bayesian Networks: Motivation

A Bayesian Approach to Learning Causal networks

Read R&N Ch Next lecture: Read R&N

Bayesian Networks Based on

Lectures on Graph Algorithms: searching, testing and sorting

Pattern Recognition and Image Analysis

CAP 5636 – Advanced Artificial Intelligence

CS 188: Artificial Intelligence

Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

CS 188: Artificial Intelligence Spring 2007

BN Semantics 3 – Now it’s personal! Parameter Learning 1

Bayesian networks Chapter 14 Section 1 – 2.

Probabilistic Reasoning

CS 188: Artificial Intelligence Spring 2006

Read R&N Ch Next lecture: Read R&N

Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,

Presented by Uroš Midić

Sanguthevar Rajasekaran University of Connecticut

CS 188: Artificial Intelligence Fall 2008

Intro. to Data Mining Chapter 6. Bayesian.

BN Semantics 2 – The revenge of d-separation

Presentation transcript:

An Algorithm for Bayesian Network Construction from Data by: Jie Cheng David A. Bell Weiru Liu University of Ulster, UK Presented by: Jian Xu

Outline Introduction Some basic concepts The proposed algorithm for BN construction Experiment results Discussions & comments 2/4/2019 Machine Learning

What is a Bayesian Network? Serum Calcium Brain Tumor Metastatic Cancer Coma Headaches P(M) .20 S B P(C) + + .80 + - .80 - + .80 - - .05 B P(H) + .80 - .60 M P(S) + .80 - .20 M P(B) + .20 - .05 Cancer BN Example 2/4/2019 Machine Learning

Bayesian Network (BN) A Bayesian network is a compact graphical representation of a probability distribution over a set of domain random variables X = {X1, X2, …, Xn} Two components Structure: direct acyclic graph (DAG) over nodes, which exploits causal relations in the domain CPD: each node has a conditional probability distribution associated with it 2/4/2019 Machine Learning

BN Learning Structure learning Parameter learning To identify the topology of the network Score based methods Dependency analysis methods Parameter learning To learn the conditional probabilities for a given network topology MLE, Bayesian approach, etc 2/4/2019 Machine Learning

BN Structure Learning Search & scoring methods: To search for a structure most likely to have generated the data Use heuristic search method to construct a model and evaluate it using a scoring method, such as MDL, Bayesian approach, etc May not find the best solution Random restarts: to avoid getting stuck in a local maximum Less time complexity in the worst case, i.e., when the underlying DAG is fully connected 2/4/2019 Machine Learning

BN Learning Algorithms (Cont’d) Dependency analysis methods: Use conditional independency (CI) test to analyze dependency relationships among nodes. Usually asymptotically correct when the data is DAG-faithful Works efficiently when the underlying network is sparse CI tests with large condition sets may be unreliable unless the volume of data is enormous. Used in this proposed algorithm 2/4/2019 Machine Learning

Basic Concepts D-separation: two nodes X and Y are called d-separated given C if and only if there exists no adjacency path P between X and Y, such that: every collider on P is in C or has a descendant in C no other nodes on path P is in C C is called a condition-set Open path: a path between X and Y is said to be open if every node in the path is active. Closed path: if any node in the path is inactive Collider node Non-collider node 2/4/2019 Machine Learning

Basic Concepts (Cont’d) DAG-faithful: when there exists such a DAG that can represent all the conditional independence relations of the underlying distribution. D-map: a graph G is a dependency map (D-map) of M if every independence relationship in M is true in G. (a BN with no edge) I-map: a graph G is an independency map (I-map) of M if every independence relationship in G is true in M. (fully-connected BN) Minimum I-map: a graph G is an I-map of M, but the removal of any arc from G yields a graph that is not an I-map of M. P-map: a graph G is a perfect map of M if it is both a D-map and an I-map of M. 2/4/2019 Machine Learning

Mutual Information The mutual information of two nodes Xi , Xj is defined as: The conditional mutual information is defined as: 2/4/2019 Machine Learning

Assumptions All attributes are discrete No missing values in any record All the records are drawn from a single probability model independently The size of dataset is big enough for reliable CI tests The ordering of the attributes are available before the network construction 2/4/2019 Machine Learning

An Algorithm for BN Construction Drafting Compute mutual information of each pair of nodes, and creates a draft of the model Thickening Adds arcs when the pairs of nodes cannot be d-separated, get an I-map of the model Thinning Each arc of the I-map is examined using CI tests and will be removed if the two nodes are the arc are conditionally independent 2/4/2019 Machine Learning

Drafting Phase 1. Initiate a graph G(V, E) where V={all nodes}, E={ }, Initiate two empty lists S, R 2. For each pair of nodes (vi, vj), i≠j, compute I(vi, vj). Sort all of the I(vi, vj) ≥ ε from large to small, and put the corresponding pairs of nodes into an ordered set S. 3. Get the first two pairs of nodes in S, and remove them from S. Add the Corresponding arc to E. (the direction of the arcs is determined by the available node ordering) 4. Get the first pair of nodes remained in S and remove it from S. If there is no open path between the two nodes (they are d-separated given empty set), add the corresponding arc to E. Otherwise add the pair of nodes to the end of an ordered set R. 5. Repeat step 4 until S is empty. 2/4/2019 Machine Learning

Drafting Example Figure (a) is the underlying BN structure I(B,D) ≥ I(C,E) ≥ I(B,E) ≥ I(A,B) ≥ I(B,C) ≥ I(C,D) ≥ I(D,E) ≥ I(A,D) ≥ I(A,E) ≥ I(A,C) ≥ ε Figure (b) is the draft graph 2/4/2019 Machine Learning

Thickening Phase 6. Get the first pair of nodes in R and remove it from R 7. Find a block set that blocks each open path between these nodes by a set of minimum number of nodes. Conduct a CI test, if these two nodes are still dependent on each other given the block set, connect them by an arc. 8. Go to step 6 until R is empty. 2/4/2019 Machine Learning

Thickening Example Figure (b) is the draft graph Examine (D,E) pair, find the minimum set that blocks all the open paths between D and E {B} CI test reveal that D and E are dependent given {B}, so arc (D,E) is added (A,C) is not added because A and C are independent given {B} 2/4/2019 Machine Learning

Thinning Phase 9. For each arc in E, if there are open paths between the two nodes besides this arc, remove this arc from E temporarily, and call procedure find_block_set(current graph, node1, node2). Conduct a CI test on the condition of the block set. If the two nodes are dependent, add this arc back to E; otherwise remove the arc permanently. 2/4/2019 Machine Learning

Thinning Example Figure (c) is the I-map of the underlying BN Arc (B,E) is removed because B and E are independent of each other given {C,D}. Figure (d) is the perfect I-map of the underlying dependency model (a). 2/4/2019 Machine Learning

Finding Minimum Block Set 2/4/2019 Machine Learning

Complexity Analysis For a dataset with N attributes, r maximum possible values each, k parents at most Phase I: N2 mutual information computation, each of which requires O(r2) basic operations, O(N2r2) Phase II: at most N2 CI tests, each with at most O(rk+2) basic operations, O(N2rk+2), worst case O(N2rN) Phase III: same as Phase II. 2/4/2019 Machine Learning

ALARM Network Structure 2/4/2019 Machine Learning

Experiment setup ALARM BN (A Logical Alarm Reduction Mechanism): a medical diagnosis system for patient monitoring 37 nodes, 46 arcs 3 versions: same structure, different CPD’s 10000 cases for each dataset Modified conditional mutual information calculation by taking the variable’s degree of freedom into consideration to make CI tests more reliable ε = 0.003 2/4/2019 Machine Learning

Result on ALARM BN 2/4/2019 Machine Learning

Discussions & Comments About the assumptions All attributes are discrete No missing values in any record The size of dataset is big enough for reliable CI tests The ordering of the attributes are available before the network construction 2/4/2019 Machine Learning

Discussions & Comments Threshold ε ε = 0.003 How do we pick an appropriate ε? How does it affect the accuracy and time by choosing different ε? Modification in the experiment part Use Modified conditional mutual information calculation by taking the variable’s degree of freedom into consideration to make CI tests more reliable Does this modification affect the result in any way other than increasing the accuracy? 2/4/2019 Machine Learning

Thank you! 2/4/2019 Machine Learning