Model Averaging with Discrete Bayesian Network Classifiers

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

A Tutorial on Learning with Bayesian Networks

INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.

1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.

Biointelligence Laboratory, Seoul National University

Supervised Learning Recap

Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.

Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska Hiroo Kusaba.

Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.

. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.

5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.

CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.

Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.

Artificial Intelligence Term Project #3 Kyu-Baek Hwang Biointelligence Lab School of Computer Science and Engineering Seoul National University

Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1.

Learning Maximum Likelihood Bounded Semi-Naïve Bayesian network classifiers Huang, Kaizhu Sept.25, 2002 Huang, Kaizhu Sept.25, 2002.

Learning Bayesian Networks (From David Heckerman’s tutorial)

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

Real-Time Odor Classification Through Sequential Bayesian Filtering Javier G. Monroy Javier Gonzalez-Jimenez

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.

Kernel Classifiers from a Machine Learning Perspective (sec ) Jin-San Yang Biointelligence Laboratory School of Computer Science and Engineering.

Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

A Comparison Between Bayesian Networks and Generalized Linear Models in the Indoor/Outdoor Scene Classification Problem.

Aprendizagem Computacional Gladys Castillo, UA Bayesian Networks Classifiers Gladys Castillo University of Aveiro.

Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.

Randomized Algorithms for Bayesian Hierarchical Clustering

Learning With Bayesian Networks Markus Kalisch ETH Zürich.

Variational Inference for the Indian Buffet Process

Generalized Model Selection For Unsupervised Learning in High Dimension Vaithyanathan and Dom IBM Almaden Research Center NIPS ’ 99.

Slides for “Data Mining” by I. H. Witten and E. Frank.

Lecture 2: Statistical learning primer for biologists

1 Parameter Learning 2 Structure Learning 1: The good Graphical Models – Carlos Guestrin Carnegie Mellon University September 27 th, 2006 Readings:

Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.

1 Param. Learning (MLE) Structure Learning The Good Graphical Models – Carlos Guestrin Carnegie Mellon University October 1 st, 2008 Readings: K&F:

Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.

04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.

1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:

Gaussian Process Networks Nir Friedman and Iftach Nachman UAI-2K.

Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )

Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.

CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.

Biointelligence Laboratory, Seoul National University

Chapter 3: Maximum-Likelihood Parameter Estimation

CS 2750: Machine Learning Directed Graphical Models

Boosted Augmented Naive Bayes. Efficient discriminative learning of

CSCI 5822 Probabilistic Models of Human and Machine Learning

Bayesian Models in Machine Learning

Efficient Learning using Constrained Sufficient Statistics

CS498-EA Reasoning in AI Lecture #20

Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 K&F: 7 (overview of inference) K&F: 8.1, 8.2 (Variable Elimination) Structure Learning in BNs 3: (the good,

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

SURVEY: Foundations of Bayesian Networks

Artificial Intelligence Chapter 20 Learning and Acting with Bayes Nets

Bayesian Learning Chapter

CS 188: Artificial Intelligence Fall 2008

Pegna, J.M., Lozano, J.A., and Larragnaga, P.

Parameter Learning 2 Structure Learning 1: The good

Chapter 20. Learning and Acting with Bayes Nets

Parametric Methods Berlin Chen, 2005 References:

Biointelligence Laboratory, Seoul National University

Machine Learning: Lecture 6

Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.

Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,

Introduction to Machine learning

Presentation transcript:

Model Averaging with Discrete Bayesian Network Classifiers Denver Dash and Gregory F. Cooper In the Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics (AISTATS 2003)

(c) 2003 SNU CSE Biointelligence Lab Contents Model-averaging over a class of discrete Bayesian network classifiers A partial ordering and bounded in-degree k. Theoretical results (for N nodes) The class has at least distinct structures. The summation can be performed in time. Approximate averaging in O(N) time. Experiments The technique can be beneficial even when the generating distribution is not a member of the class. Characterize the performance over several parameters. (c) 2003 SNU CSE Biointelligence Lab

Bayesian network classifiers Naïve Bayes classifier General Bayesian network classifiers C F1 F2 FN Optimal in zero-one loss Poor generalization performance could be improved by Bayesian model averaging.  the space of network structure is super-exponential. F1 C F2 FN (c) 2003 SNU CSE Biointelligence Lab

(c) 2003 SNU CSE Biointelligence Lab In this paper Bayesian model-averaging over a restricted class of Bayesian network classifiers A partial order (π) and a bounded in-degree (k). Contributions The factorization of the conditionals to apply to the task of classification. Show that MA over this class can be approximated by a single network S*  calculation in O(N) time. Empirical evaluation of the method compared with A single naïve Bayes classifer A single Bayesian network learned by a greedy search Exact MA on naïve Bayes classifiers. (c) 2003 SNU CSE Biointelligence Lab

(c) 2003 SNU CSE Biointelligence Lab Notations The classification problem A set of features F = {F1, F2, …, FN}. X0 = C, X1 = F1, …, XN = FN.  X (in Bayesian networks) A set of classes C = {C1, C2, …, CNC}. A database D = {D1, D2, …, DR}. A Bayesian network G(X): a DAG structure Xi: a multinomial distribution Pi: a parents of Xi A parameter Parameter set θ Other assumptions: parameter independence, Dirichlet priors, … (c) 2003 SNU CSE Biointelligence Lab

Fixed network structures With the fixed network parameters θ Bayesian averaging over the parameters with conjugate priors (c) 2003 SNU CSE Biointelligence Lab

Averaging with a fixed ordering (1) For a structural feature, e.g. XL  XM The posterior probability P(XL  XM|D), The structure modularity The marginal likelihood (decomposable) (c) 2003 SNU CSE Biointelligence Lab

Averaging with a fixed ordering (2) Then, the posterior probability of a structural feature can be represented as, (c) 2003 SNU CSE Biointelligence Lab

Averaging with a fixed ordering (3) Enumerating the possible parents of Xi given a partial ordering: π: <{X1, X3}, {X2, X4}>, k = 2. P20 = 0, P21 = {X1}, P22 = {X3}, P23 = {X1, X3}. (c) 2003 SNU CSE Biointelligence Lab

Averaging with a fixed ordering (4) (c) 2003 SNU CSE Biointelligence Lab

Averaging with a fixed ordering (5) (c) 2003 SNU CSE Biointelligence Lab

Averaging with a fixed ordering (6) Dynamic programming solution Finally, (c) 2003 SNU CSE Biointelligence Lab

Model averaging for predictions The probability of a new example can be calculated as similarly as the probability of a structural feature. Hence, The parameter value θijk is used on behalf of the Kronecker-delta function. (c) 2003 SNU CSE Biointelligence Lab

Approximation on the model averaging The time bound is still severe even for moderate cases (k = 3 or 4). One approximation Order the set of possible parents for Xi based on the function f(Xi, Piν|D) and prune them. (c) 2003 SNU CSE Biointelligence Lab

Experimental evaluation (1) Performance metric δ = (R1 – R2 / T – R2) Synthetic data sets Comparisons between exact averaging and approximation (c) 2003 SNU CSE Biointelligence Lab

Experimental evaluation (2) Approximate model averaging vs. greedy thick-thin search (c) 2003 SNU CSE Biointelligence Lab

Experimental evaluation (3) Synthetic data from the ALARM network AMA vs. GTT (c) 2003 SNU CSE Biointelligence Lab

Experimental evaluation (4) Real classification data sets from the UCI repository (c) 2003 SNU CSE Biointelligence Lab

(c) 2003 SNU CSE Biointelligence Lab Discussion Approximate model averaging outperforms a single BN classifier. Simplicity of the implementation. Future work Find a better method for optimizing for the ordering. Applications to the real-world problems. Relax the assumption of the complete data. (c) 2003 SNU CSE Biointelligence Lab