Learning Bayesian Networks with Local Structure by Nir Friedman and Moises Goldszmidt.

Slides:



Advertisements
Similar presentations
CS188: Computational Models of Human Behavior
Advertisements

Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.
A Tutorial on Learning with Bayesian Networks
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
. The sample complexity of learning Bayesian Networks Or Zuk*^, Shiri Margel* and Eytan Domany* *Dept. of Physics of Complex Systems Weizmann Inst. of.
Graduate Center/City University of New York University of Helsinki FINDING OPTIMAL BAYESIAN NETWORK STRUCTURES WITH CONSTRAINTS LEARNED FROM DATA Xiannian.
© 1998, Nir Friedman, U.C. Berkeley, and Moises Goldszmidt, SRI International. All rights reserved. Learning I Excerpts from Tutorial at:
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
Review: Bayesian learning and inference
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Probabilistic Reasoning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 14 (14.1, 14.2, 14.3, 14.4) Capturing uncertain knowledge Probabilistic.
Bayesian networks Chapter 14 Section 1 – 2.
. PGM: Tirgul 10 Learning Structure I. Benefits of Learning Structure u Efficient learning -- more accurate models with less data l Compare: P(A) and.
Bayesian Belief Networks
1Causality & MDL Causal Models as Minimal Descriptions of Multivariate Systems Jan Lemeire June 15 th 2006.
PGM: Tirgul 11 Na?ve Bayesian Classifier + Tree Augmented Na?ve Bayes (adapted from tutorial by Nir Friedman and Moises Goldszmidt.
Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
1er. Escuela Red ProTIC - Tandil, de Abril, Bayesian Learning 5.1 Introduction –Bayesian learning algorithms calculate explicit probabilities.
Artificial Intelligence Term Project #3 Kyu-Baek Hwang Biointelligence Lab School of Computer Science and Engineering Seoul National University
. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.
Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.
Probabilistic Reasoning
A Brief Introduction to Graphical Models
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem.
Data Analysis with Bayesian Networks: A Bootstrap Approach Nir Friedman, Moises Goldszmidt, and Abraham Wyner, UAI99.
Aprendizagem Computacional Gladys Castillo, UA Bayesian Networks Classifiers Gladys Castillo University of Aveiro.
1 Instance-Based & Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.
Introduction to Bayesian Networks
Lectures 2 – Oct 3, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Chapter 6 Bayesian Learning
Slides for “Data Mining” by I. H. Witten and E. Frank.
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, –  Carlos.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.
Dependency Networks for Collaborative Filtering and Data Visualization UAI-2000 발표 : 황규백.
Exploiting Structure in Probability Distributions Irit Gat-Viks Based on presentation and lecture notes of Nir Friedman, Hebrew University.
Advances in Bayesian Learning Learning and Inference in Bayesian Networks Irina Rish IBM T.J.Watson Research Center
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,
Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.
Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners
A Cooperative Coevolutionary Genetic Algorithm for Learning Bayesian Network Structures Arthur Carvalho
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.
By Ping-Chu Hung Advisor: Ying-Ping Chen.  Introduction: background and objectives  Review of ECGA  ECGA for integer variables ◦ Experiments and performances.
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
Probabilistic Reasoning Inference and Relational Bayesian Networks.
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11 CS479/679 Pattern Recognition Dr. George Bebis.
CS 2750: Machine Learning Review
CS 2750: Machine Learning Directed Graphical Models
Bayesian networks Chapter 14 Section 1 – 2.
Qian Liu CSE spring University of Pennsylvania
Learning Bayesian Network Models from Data
Irina Rish IBM T.J.Watson Research Center
Efficient Learning using Constrained Sufficient Statistics
Bayesian Networks Based on
A Branch-and Bound Algorithm for MDL Learning Bayesian Networks
Bayesian networks Chapter 14 Section 1 – 2.
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Presentation transcript:

Learning Bayesian Networks with Local Structure by Nir Friedman and Moises Goldszmidt

Object: To represent and learn the local structure in the CPDs. Table of Contents Introduction Learning Bayesian Networks(MDL/BDe Score) (MDL:Minimal Description Length score) Learning Local Structure(MDL/BDe Scores for Default Tables/Decision Trees; Algorithms) Experimental Results

1. Introduction Bayesian network : DAG(global) + CPDs(local) - local structures for CPDs: table, decision tree, noisy-or gate, etc. (DAG: Directed Acyclic Graph, CPD: Conditional Probability Distribution) e.g.) a CPD is encoded by a table that is locally exponential in the number of parents of X. A: alarm armed, B: burglary, E: earthquake, S: loud alarm sound (all variables are binary).

The learning of local structures motivated by CSI (Boutilier et al, 1996): (CSI: Context-Specific Independence) default table decision tree (Quinlan and Rivest, 1989) Improvements: 1. The induced parameters are more reliable. 2. The global structure induced is a better approximation to the real dependencies by considering networks with exponential penalty.

2. Learning Bayesian Networks A Bayesian network for : B = where G: DAG, L: a set of CPDs, each is independent of its nondescendants and Problem: Given a training set D = { u 1,..., u n } of instances U, find a network B = that best matches D.

2.1. MDL Score (Rissanen, 1989) code length(data) = code length (model) + code length(data | model) (data: D, model: B, P B ) - Balance between complexity and accuracy total description length: DL(B, D) = DL(G) + DL(L) + DL(D | B)

(Cover and Thomas, 1991)

2.2. BDe Score Bayes Rule: Under a Dirichlet Prior: Equivalence of MDL and BDe scores (Schwarz, 1978): ( : Hyperparameters of Dirichlet, : vector of parameters for the CPDs quantifying G. )

3. Learning Local Structure 3.1. Scoring functions S L - the structure of local representation - the parameterization of L Rows(DT): partition of Pa i : Mapping of Pa i to the partition that contains it L = (S L, )

MDL score for local structure : encoding of S L for a default table : for a tree : ( k=|Rows(D)| ) (encoding a bit set to value 1 followed by the description of test variable and trees) encoding of : MDL score

BDe score for local structure : Bayes rule: a natural prior over local structures: Under Dirichlet prior of parameters:

3.2. Learning Procedures greedy hillclimbing: for network structure

Default Table:

Decision Tree: Quinlan and Rivest(1989)

4. Experimental Results

DESCRIPTIONS OF THE NETWORK USED IN THE EXPERIMENTS Alarm : for monitoring patients in intensive care n=37, |U|=, Hailfinder : for monitoring summer hail in NE Coloraro n=56, |U|=, Insurance : classifying insurance applications n=27, |U|=, * |U| = val (U) : the set of values U can attain.(fig.1)