Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No. 2000020 May.

Slides:



Advertisements
Similar presentations
Bayesian network for gene regulatory network construction
Advertisements

1 WHY MAKING BAYESIAN NETWORKS BAYESIAN MAKES SENSE. Dawn E. Holmes Department of Statistics and Applied Probability University of California, Santa Barbara.
A Tutorial on Learning with Bayesian Networks
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
1 1 Slide © 2004 Thomson/South-Western Payoff Tables n The consequence resulting from a specific combination of a decision alternative and a state of nature.
Learning Bayesian Networks. Dimensions of Learning ModelBayes netMarkov net DataCompleteIncomplete StructureKnownUnknown ObjectiveGenerativeDiscriminative.
Graduate Center/City University of New York University of Helsinki FINDING OPTIMAL BAYESIAN NETWORK STRUCTURES WITH CONSTRAINTS LEARNED FROM DATA Xiannian.
Hierarchical Decompositions for Congestion Minimization in Networks Harald Räcke 1.
Linkage Problem, Distribution Estimation, and Bayesian Networks Evolutionary Computation 8(3) Martin Pelikan, David E. Goldberg, and Erick Cantu-Paz.
1 Structure of search space, complexity of stochastic combinatorial optimization algorithms and application to biological motifs discovery Robin Gras INRIA.
Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
Estimation of Distribution Algorithms Ata Kaban School of Computer Science The University of Birmingham.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Learning with Bayesian Networks David Heckerman Presented by Colin Rickert.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Learning Bayesian Networks. Dimensions of Learning ModelBayes netMarkov net DataCompleteIncomplete StructureKnownUnknown ObjectiveGenerativeDiscriminative.
Bayesian Learning Rong Jin.
Learning Bayesian Networks (From David Heckerman’s tutorial)
1 Learning with Bayesian Networks Author: David Heckerman Presented by Yan Zhang April
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials.
Rule Generation [Chapter ]
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem.
Bayesian Networks Martin Bachler MLA - VO
1 Bayesian Param. Learning Bayesian Structure Learning Graphical Models – Carlos Guestrin Carnegie Mellon University October 6 th, 2008 Readings:
Non-Informative Dirichlet Score for learning Bayesian networks Maomi Ueno and Masaki Uto University of Electro-Communications, Japan 1.Introduction: Learning.
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))
Learning Bayesian Networks with Local Structure by Nir Friedman and Moises Goldszmidt.
Decision Trees. MS Algorithms Decision Trees The basic idea –creating a series of splits, also called nodes, in the tree. The algorithm adds a node to.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Randomized Algorithms for Bayesian Hierarchical Clustering
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Slides for “Data Mining” by I. H. Witten and E. Frank.
BCS547 Neural Decoding.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Dependency Networks for Collaborative Filtering and Data Visualization UAI-2000 발표 : 황규백.
1 Parameter Learning 2 Structure Learning 1: The good Graphical Models – Carlos Guestrin Carnegie Mellon University September 27 th, 2006 Readings:
Adaptive Dependent Context BGL: Budgeted Generative (-) Learning Given nothing about training instances, pay for any feature [no “labels”, no “attributes”
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
1 Param. Learning (MLE) Structure Learning The Good Graphical Models – Carlos Guestrin Carnegie Mellon University October 1 st, 2008 Readings: K&F:
Diversity Loss in General Estimation of Distribution Algorithms J. L. Shapiro PPSN (Parallel Problem Solving From Nature) ’06 BISCuit 2 nd EDA Seminar.
1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,
Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Bayesian Belief Network AI Contents t Introduction t Bayesian Network t KDD Data.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Boosted Augmented Naive Bayes. Efficient discriminative learning of
Maximum Likelihood Estimation
Irina Rish IBM T.J.Watson Research Center
Data Mining Lecture 11.
Model Averaging with Discrete Bayesian Network Classifiers
Exact Inference Continued
Introduction to Data Mining, 2nd Edition by
Introduction to Data Mining, 2nd Edition by
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 K&F: 7 (overview of inference) K&F: 8.1, 8.2 (Variable Elimination) Structure Learning in BNs 3: (the good,
Pegna, J.M., Lozano, J.A., and Larragnaga, P.
Parameter Learning 2 Structure Learning 1: The good
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
FDA – A Scalable Evolutionary Algorithm for the Optimization of Additively Decomposed Functions BISCuit EDA Seminar
Presentation transcript:

Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May 2000.

Abstract The use of various scoring metrics for Bayesian networks. The use of decision graphs in Bayesian networks to improve the performance of the BOA. BDe metric for Bayesian networks with decision graphs.

Bayesian Networks Two basics components in Bayesian Networks –A scoring metric for discriminates the networks –A search algorithm for finding the best scoring metric value BOA (in previous works) –The complexity of the considered models was bounded by the maximum number of incoming edges into any node. –To search the space of networks, a simple greedy algorithm was used due to its efficiency.

Bayesian-Dirichlet Metric BDe metric combines the prior knowledge about the problem and the statistical data from a given data set. Bayes theorem The higher the p ( B | D ), the more likely the network B is a correct model of the data.  Bayesian scoring metric, or the posterior probability –Even more, we use a fixed data set D.

Bayesian-Dirichlet Metric p ( B ) : prior probability of the network B BDe metric gives preference to simpler networks –But, it’s not enough!

Bayesian-Dirichlet Metric p ( B | D ) –Data is a multinomial sample –Parameters are independent 1.The parameters associated with each variable are independent (global parameter independence) 2.The parameters associated with each instance of the parents of a variable are independent (local parameter independence) –Dirichlet distribution –No missing data (complete data)

Bayesian-Dirichlet Metric Often referred to K2 metric

Minimum Description Length Metric Not good for using prior information

Constructing a Network Constructing a best network is NP- complete. Most of the commonly used metrics can be decomposed into independent terms each of which corresponds to one variable. Empirical results show that more sophisticated search algorithms do not improve the obtained result significantly.

Decision Graphs in Bayesian Networks The use of local structures as decision trees, decision graphs, and default tables to represent equalities among parameters was proposed The network construction algorithm takes an advantage of using decision graphs by directly manipulating the network structure through the graphs.

Decision Graphs A decision graph is an extension of a decision tree in which each non-root node can have multiple parents.

Advantages of Decision Graph Much less parents can be used to represent a model Learning more complex class of models, called Bayesian multinets Performs smaller and more specific steps what results in better models with respect to their likelihood. Network complexity measure can be incorporated into the scoring metir

Bayesian Score for Networks with Decision Graphs

Operators on Decision Graphs splitmerge

Constructing BN with DG 1.Initialize a decision graph G i for each node x i to a graph containing only a single leaf. 2.Initialize the network B into an empty network. 3.Choose the best split or merge that does not result in a cycle in B. 4.If the best operator does not improve the score, finish.

Constructing BN with DG 5.Execute the chosen operator 6.If the operator was a split, update the network B by adding a new edge. 7.Go to (3)

Experiments One-max 3-deceptive Spin-glass Graph bisection