Pegna, J.M., Lozano, J.A., and Larragnaga, P.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Hierarchical Dirichlet Processes
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
Rutgers CS440, Fall 2003 Review session. Rutgers CS440, Fall 2003 Topics Final will cover the following topics (after midterm): 1.Uncertainty & introduction.
Biointelligence Laboratory, Seoul National University
Dynamic Bayesian Networks (DBNs)
Supervised Learning Recap
Linkage Problem, Distribution Estimation, and Bayesian Networks Evolutionary Computation 8(3) Martin Pelikan, David E. Goldberg, and Erick Cantu-Paz.
Ch 13. Sequential Data (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Kim Jin-young Biointelligence Laboratory, Seoul.
Part 4 b Forward-Backward Algorithm & Viterbi Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
1 Unsupervised Learning With Non-ignorable Missing Data Machine Learning Group Talk University of Toronto Monday Oct 4, 2004 Ben Marlin Sam Roweis Rich.
Expectation Maximization for GMM Comp344 Tutorial Kai Zhang.
Visual Recognition Tutorial
Learning Bayesian Networks
Artificial Intelligence Term Project #3 Kyu-Baek Hwang Biointelligence Lab School of Computer Science and Engineering Seoul National University
(C) 2001 SNU CSE Biointelligence Lab Incremental Classification Using Tree- Based Sampling for Large Data H. Yoon, K. Alsabti, and S. Ranka Instance Selection.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
Biointelligence Laboratory, Seoul National University
A Brief Introduction to Graphical Models
1 Naïve Bayes Models for Probability Estimation Daniel Lowd University of Washington (Joint work with Pedro Domingos)
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture model Based on Minimum Message Length by Nizar Bouguila.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Randomized Algorithms for Bayesian Hierarchical Clustering
CS Statistical Machine learning Lecture 24
Slides for “Data Mining” by I. H. Witten and E. Frank.
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
K2 Algorithm Presentation KDD Lab, CIS Department, KSU
Biointelligence Laboratory, Seoul National University
Variational Bayesian Methods for Audio Indexing
Lecture 2: Statistical learning primer for biologists
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
Pattern Recognition and Machine Learning
Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Hidden Markov Models Achim Tresch MPI for Plant Breedging Research & University of Cologne.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Ch 14. Combining Models Pattern Recognition and Machine Learning, C. M
Irina Rish IBM T.J.Watson Research Center
Model Averaging with Discrete Bayesian Network Classifiers
Latent Variables, Mixture Models and EM
Bayesian Networks: Motivation
Akio Utsugi National Institute of Bioscience and Human-technology,
Bayesian Models in Machine Learning
CS498-EA Reasoning in AI Lecture #20
SMEM Algorithm for Mixture Models
Stochastic Optimization Maximization for Latent Variable Models
Topic models for corpora and for graphs
Summarized by Kim Jin-young
Topic models for corpora and for graphs
Biointelligence Laboratory, Seoul National University
Multivariate Methods Berlin Chen
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Biointelligence Laboratory, Seoul National University
Multivariate Methods Berlin Chen, 2005 References:
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Clustering (2) & EM algorithm
Chapter 14 February 26, 2004.
Presentation transcript:

Learning recursive Bayesian multinets for data clustering by means of constructive induction Pegna, J.M., Lozano, J.A., and Larragnaga, P. Machine Learning, 47(1), pp. 63-89, 2002. Summarized by Kyu-Baek Hwang

data clustering problem data partitioning k-means representing the joint probability distribution of a database mixture density models (e.g., Gaussian mixtures) Bayesian networks C Y1 Y2 Y3 Y4 Y5 (c) 2003 SNU CSE Biointelligence Laboratory

recursive Bayesian multinets RBMN a decision tree where each decision path ends in an alternate component Bayesian network (BN) context-specific conditional independencies (c) 2003 SNU CSE Biointelligence Laboratory

BNs for data clustering the joint probability distribution by a BN (c) 2003 SNU CSE Biointelligence Laboratory

(c) 2003 SNU CSE Biointelligence Laboratory Bayesian multinets encode the context-specific conditional independencies. (c) 2003 SNU CSE Biointelligence Laboratory

(c) 2003 SNU CSE Biointelligence Laboratory RBMNs extensions of BMNs or partitional clustering systems (c) 2003 SNU CSE Biointelligence Laboratory

(c) 2003 SNU CSE Biointelligence Laboratory real world domain geographical distribution of malignant tumors (c) 2003 SNU CSE Biointelligence Laboratory

component BN structures extended naïve Bayes (ENB) models a selection of the attributes to be included in the models (X) some attributes can be grouped together under the same node (O) (c) 2003 SNU CSE Biointelligence Laboratory

learning algorithm for ENB models (c) 2003 SNU CSE Biointelligence Laboratory

(c) 2003 SNU CSE Biointelligence Laboratory parameter search EM (expectation maximization) algorithm BC (bound and collapse) + EM algorithm (O) (c) 2003 SNU CSE Biointelligence Laboratory

(c) 2003 SNU CSE Biointelligence Laboratory structure search constructive induction the process of changing the representation of the cases in the database by creating new attributes from existing attributes. forward algorithm backward algorithm (c) 2003 SNU CSE Biointelligence Laboratory

marginal likelihood criterion for RBMNs with uninformative Dirichlet prior, for BMNs, with some reasonable assumptions including parameter independence, for RBMNs, (c) 2003 SNU CSE Biointelligence Laboratory

learning algorithm for RBMNs (c) 2003 SNU CSE Biointelligence Laboratory

(c) 2003 SNU CSE Biointelligence Laboratory experimental setup both synthetic data and real data discrete variables with (unrestricted) multinomial distributions convergence criterion for BC + EM algorithm change in the log marginal likelihood value is less than 10-6 or 150 iterations fixing_probability_threshold: 0.51 initial structure: naïve Bayes model 5 independent runs at each experiment (c) 2003 SNU CSE Biointelligence Laboratory

1-level RBMNs for the experiments (c) 2003 SNU CSE Biointelligence Laboratory

2-level RBMNs for the experiments (c) 2003 SNU CSE Biointelligence Laboratory

performance for 4 synthetic databases (c) 2003 SNU CSE Biointelligence Laboratory

performance for real world data tic-tac-toe data: 2 clusters with 9 predictive variables, 958 cases nursery data: 5 clusters with 8 predictive variables, 12960 cases (c) 2003 SNU CSE Biointelligence Laboratory

conclusions and future research context-specific conditional independencies data partitioning efficient representation, Bayesian committees, mixture of experts learning speed problem trade-off with the efficient representation monothetic decision tree polythetic paths  enrich the modeling power extensions to the continuous domain (c) 2003 SNU CSE Biointelligence Laboratory