Variational Bayes Model Selection for Mixture Distribution

Slides:



Advertisements
Similar presentations
Part 2: Unsupervised Learning
Advertisements

Recent Advances in Bayesian Inference Techniques
Markov Chain Sampling Methods for Dirichlet Process Mixture Models R.M. Neal Summarized by Joon Shik Kim (Thu) Computational Models of Intelligence.
State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.
Bayesian dynamic modeling of latent trait distributions Duke University Machine Learning Group Presented by Kai Ni Jan. 25, 2007 Paper by David B. Dunson,
Pattern Recognition and Machine Learning
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Shinichi Nakajima Sumio Watanabe  Tokyo Institute of Technology
Laboratory for Social & Neural Systems Research (SNS) PATTERN RECOGNITION AND MACHINE LEARNING Institute of Empirical Research in Economics (IEW)
Industrial Engineering College of Engineering Bayesian Kernel Methods for Binary Classification and Online Learning Problems Theodore Trafalis Workshop.
Visual Recognition Tutorial
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc Variational Bayes 101.
Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1.
Bayesian Learning for Conditional Models Alan Qi MIT CSAIL September, 2005 Joint work with T. Minka, Z. Ghahramani, M. Szummer, and R. W. Picard.
Latent Variable Models Christopher M. Bishop. 1. Density Modeling A standard approach: parametric models  a number of adaptive parameters  Gaussian.
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
Hierarchical Dirichelet Processes Y. W. Tech, M. I. Jordan, M. J. Beal & D. M. Blei NIPS 2004 Presented by Yuting Qi ECE Dept., Duke Univ. 08/26/05 Sharing.
Hyperparameter Estimation for Speech Recognition Based on Variational Bayesian Approach Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee and Keiichi.
High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture model Based on Minimum Message Length by Nizar Bouguila.
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
Randomized Algorithms for Bayesian Hierarchical Clustering
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.
CS Statistical Machine learning Lecture 24
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Variational Bayesian Methods for Audio Indexing
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Lecture 2: Statistical learning primer for biologists
Beam Sampling for the Infinite Hidden Markov Model by Jurgen Van Gael, Yunus Saatic, Yee Whye Teh and Zoubin Ghahramani (ICML 2008) Presented by Lihan.
Bayesian Travel Time Reliability
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Sparse Approximate Gaussian Processes. Outline Introduction to GPs Subset of Data Bayesian Committee Machine Subset of Regressors Sparse Pseudo GPs /
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
CSC321: Introduction to Neural Networks and Machine Learning Lecture 17: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
RECITATION 2 APRIL 28 Spline and Kernel method Gaussian Processes Mixture Modeling for Density Estimation.
CSC2535: Computation in Neural Networks Lecture 7: Independent Components Analysis Geoffrey Hinton.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
A Method to Approximate the Bayesian Posterior Distribution in Singular Learning Machines Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Introduction to Machine Learning Nir Ailon Lecture 11: Probabilistic Models.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Meredith L. Wilcox FIU, Department of Epidemiology/Biostatistics
CS 2750: Machine Learning Density Estimation
Ch3: Model Building through Regression
Linear Regression (continued)
Particle Filtering for Geometric Active Contours
CSC321: Neural Networks Lecture 19: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Statistical Models for Automatic Speech Recognition
Special Topics In Scientific Computing
Latent Variables, Mixture Models and EM
Distributions and Concepts in Probability Theory
A Non-Parametric Bayesian Method for Inferring Hidden Causes
Neuroscience Research Institute University of Manchester
Akio Utsugi National Institute of Bioscience and Human-technology,
Statistical Models for Automatic Speech Recognition
Pattern Recognition and Machine Learning
LECTURE 07: BAYESIAN ESTIMATION
Parametric Methods Berlin Chen, 2005 References:
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions July chonbuk national university.
Qiang Huo(*) and Chorkin Chan(**)
MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
The Application of Hidden Markov Models in Speech Recognition
Presentation transcript:

Variational Bayes Model Selection for Mixture Distribution Authors: Adrian Corduneanu & Christopher M. Bishop Presented by Shihao Ji Duke University Machine Learning Group Jan. 20, 2006

Outline Introduction – model selection Automatic Relevance Determination (ARD) Experimental Results Application to HMMs

Introduction Cross validation Bayesian approaches MCMC and Laplace approximation (Traditional) variational method (Type II) variational method

Automatic Relevance Determination (ARD) relevance vector regression Given a dataset , we assume is Gaussian Likelihood: Prior: Posterior: Determination of hyperparameters: Type II ML

Automatic Relevance Determination (ARD) mixture of Gaussian Given an observed dataset , we assume each data point is drawn independently from a mixture of Gaussian density Likelihood: Prior: Posterior: VB Determination of mixing coefficients: Type II ML

Automatic Relevance Determination (ARD) model selection Bayesian method: , Component elimination: if , i.e.,

Experimental Results Bayesian method vs. cross-validation 600 points drawn from a mixture of 5 Gaussians.

Experimental Results Component elimination Initially the model had 15 mixtures, finally was pruned down to 3 mixtures

Experimental Results

Automatic Relevance Determination (ARD) hidden Markov model Given an observed dataset , we assume each data sequence is generated independently from an HMM Likelihood: Prior: Posterior: VB Determination of p and A: Type II ML

Automatic Relevance Determination (ARD) model selection Bayesian method: , State elimination: if , Define -- visiting frequency where

Experimental Results (1)

Experimental Results (2)

Experimental Results (3)

Questions?