A Branch-and Bound Algorithm for MDL Learning Bayesian Networks

Slides:



Advertisements
Similar presentations
Xiaoming Sun Tsinghua University David Woodruff MIT
Advertisements

Chapter 5: Tree Constructions
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Algorithm Design Techniques: Greedy Algorithms. Introduction Algorithm Design Techniques –Design of algorithms –Algorithms commonly used to solve problems.
CS Fall 2012, Lab 08 Haohan Zhu. Boston University Slideshow Title Goes Here CS Fall 2012, Lab /17/2015 Tree - Data Structure  Basic.
Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv 1.
. The sample complexity of learning Bayesian Networks Or Zuk*^, Shiri Margel* and Eytan Domany* *Dept. of Physics of Complex Systems Weizmann Inst. of.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
1/21 Finding Optimal Bayesian Network Structures with Constraints Learned from Data 1 City University of New York 2 University of Helsinki Xiannian Fan.
Graduate Center/City University of New York University of Helsinki FINDING OPTIMAL BAYESIAN NETWORK STRUCTURES WITH CONSTRAINTS LEARNED FROM DATA Xiannian.
SLIQ: A Fast Scalable Classifier for Data Mining Manish Mehta, Rakesh Agrawal, Jorma Rissanen Presentation by: Vladan Radosavljevic.
Sum of Subsets and Knapsack
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
Lecture 6: Huffman Code Thinh Nguyen Oregon State University.
MAE 552 – Heuristic Optimization Lecture 26 April 1, 2002 Topic:Branch and Bound.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Jin Zheng, Central South University1 Branch-and-bound.
Learning Chapter 18 and Parts of Chapter 20
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning RASTOGI, Rajeev and SHIM, Kyuseok Data Mining and Knowledge Discovery, 2000, 4.4.
Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials.
Machine Learning Chapter 3. Decision Tree Learning
Chapter 19: Binary Trees. Objectives In this chapter, you will: – Learn about binary trees – Explore various binary tree traversal algorithms – Organize.
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
The Lower Bounds of Problems
On a Network Creation Game PoA Seminar Presenting: Oren Gilon Based on an article by Fabrikant et al 1.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Learning Bayesian Networks with Local Structure by Nir Friedman and Moises Goldszmidt.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Search by partial solutions.  nodes are partial or complete states  graphs are DAGs (may be trees) source (root) is empty state sinks (leaves) are complete.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
Slides for “Data Mining” by I. H. Witten and E. Frank.
1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, –  Carlos.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Abdullah Aldahami ( ) April 6,  Huffman Coding is a simple algorithm that generates a set of variable sized codes with the minimum average.
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.
Analysis & Design of Algorithms (CSCE 321)
1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) [Edited by J. Wiebe] Decision Trees.
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
Machine Learning of Bayesian Networks Using Constraint Programming
Learning Tree Structures
A New Algorithm for Computing Upper Bounds for Functional EmajSAT
Heuristic Search A heuristic is a rule for choosing a branch in a state space search that will most likely lead to a problem solution Heuristics are used.
RE-Tree: An Efficient Index Structure for Regular Expressions
The CPLEX Library: Mixed Integer Programming
The Greedy Method and Text Compression
Here is a puzzle I found on a t-shirt
Data Mining Lecture 11.
Model Averaging with Discrete Bayesian Network Classifiers
(2,4) Trees 11/15/2018 9:25 AM Sorting Lower Bound Sorting Lower Bound.
The Complexity of Algorithms and the Lower Bounds of Problems
Bayesian Models in Machine Learning
Efficient Learning using Constrained Sufficient Statistics
Identifying and sorting jordan sequences
Branch and Bound.
(2,4) Trees 12/4/2018 1:20 PM Sorting Lower Bound Sorting Lower Bound.
Machine Learning Chapter 3. Decision Tree Learning
Searching for Solutions
Greedy Algorithms Many optimization problems can be solved more quickly using a greedy approach The basic principle is that local optimal decisions may.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Machine Learning Chapter 3. Decision Tree Learning
The Lower Bounds of Problems
(2,4) Trees 2/28/2019 3:21 AM Sorting Lower Bound Sorting Lower Bound.
CSE 589 Applied Algorithms Spring 1999
Presented by Uroš Midić
Presentation transcript:

A Branch-and Bound Algorithm for MDL Learning Bayesian Networks Chapter 6 A Branch-and Bound Algorithm for MDL Learning Bayesian Networks Jin Tian Cognitive Systems Lab. UCLA

Contents MDL Score Previous algorithms Search Space Depth-First Branch-and-Bound Algorithm Experimental Results

MDL Score Training data set: D = {u1 , u2 , .. , uN} Total description length (DL) = length of description of model + length of description of D MDL principle (Rissanen, 1989): Optimal model minimizes the total description length

G:Graph , U = ( X1 , .. , Xn ) , DL = DL(Data) + DL(Model) DL(Model) : Penalty for complexity , # parameters to represent each(i–j) state. DL(Data|G) : for each case u, use - log P(u|G) as an optimal encoding length (Huffman code) H term : N * conditional entropy(X|Pa)

Assume: X1 < .. < Xn to reduce search complexity MDL(G,D) is minimized iff each local score is minimized : Find a subset Pa for each X that minimizes MDL(X|Pa) [here each Parent set can be independently selected.] For each Xi , sets to search for the Parent set and total of sets.

Previous algorithms K2: Cooper and Herskovits(1992), BD score K3: K2 with MDL score Branch-and-bound: Suzuki(1996) MDL(X|Pa) = H + (log N /2)*K (K= #parameters for parents, H= N*empirical entropy) Adding a node to Pa : K increases by K(old)*(r-1), while H decreases no more than H(old) if H(old) < K : positive MDL and further search is unnecessary Smaller H for the speed of pruning lower bound of MDL: MDL >= (log N /2)*K

Search Space Problem: Find a subset of Uj ={X1 , .. , Xj-1} that minimize the MDL score. Search space: states-operators set State: a subset of Uj (node) Operator: adding an X (edge) In a search Tree, a state T with l variables is {Xk1, .. , Xkl } where Xk1 < .. < Xkl are ordered. (Tree order). A legal operator: Adding a single variable after Xil .

(A serach for the parents of X5 ) The search tree for Xj has 2j-1 nodes and the tree depth is j-1

Branch-and-Bound Algorithm In finding a parent of Xj , assume we are visiting a state T = {.., Xkl} and let W be the set of rest variables. We want to decide if we need to visit the branch below T’s child : T  {Xq}, Xq  W . Pruning: Find initial minMDL from K3 (speedy) and compare with the lower bound of MDL of that branch.

Lower bound (Suzuki): Better lower bound: Pruning: If , all branches below T  {Xq} can be pruned.

In node ordering: Xk1 < .. < Xk(j-1) , Xk1 appears least, Xk(j-1) appears most. Tree Order as: H(Xj|Xk1)<= H(Xj|Xk2)<= .. <=H(Xj|Xk(j-1)) Result: Most of the lower bounds have larger values. Visiting of fewer states.

Empirical Results ALARM(37 nodes, 46 edges) Boerlage92(23 nodes, 36 edges) Car-Diagnosis_2(18 nodes, 20 edges) Hailfinder2.5(56 nodes, 66 edges) A(54 nodes, dence edges) B(18 nodes 39 edges)