High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture model Based on Minimum Message Length by Nizar Bouguila.

Slides:

Advertisements

Similar presentations

Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition.

Advertisements

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Gaussian Mixture.

Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.

INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.

CHAPTER 13 M ODELING C ONSIDERATIONS AND S TATISTICAL I NFORMATION “All models are wrong; some are useful.”  George E. P. Box Organization of chapter.

CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct

Model-based clustering of gene expression data Ka Yee Yeung 1,Chris Fraley 2, Alejandro Murua 3, Adrian E. Raftery 2, and Walter L. Ruzzo 1 1 Department.

Industrial Engineering College of Engineering Bayesian Kernel Methods for Binary Classification and Online Learning Problems Theodore Trafalis Workshop.

Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.

Visual Recognition Tutorial

Distributional Clustering of Words for Text Classification Authors: L.Douglas Baker Andrew Kachites McCallum Presenter: Yihong Ding.

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

1 Unsupervised Learning With Non-ignorable Missing Data Machine Learning Group Talk University of Toronto Monday Oct 4, 2004 Ben Marlin Sam Roweis Rich.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Visual Recognition Tutorial

Latent Variable Models Christopher M. Bishop. 1. Density Modeling A standard approach: parametric models  a number of adaptive parameters  Gaussian.

Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.

EM and expected complete log-likelihood Mixture of Experts

G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.

MML Inference of RBFs Enes Makalic Lloyd Allison Andrew Paplinski.

Bayesian Sets Zoubin Ghahramani and Kathertine A. Heller NIPS 2005 Presented by Qi An Mar. 17 th, 2006.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.

Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.

Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

CS Statistical Machine learning Lecture 10 Yuan (Alan) Qi Purdue CS Sept

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.

The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.

Randomized Algorithms for Bayesian Hierarchical Clustering

Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.

Variational Bayesian Methods for Audio Indexing

Lecture 2: Statistical learning primer for biologists

CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct

Lecture 3: MLE, Bayes Learning, and Maximum Entropy

Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.

The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.

Sparse Approximate Gaussian Processes. Outline Introduction to GPs Subset of Data Bayesian Committee Machine Subset of Regressors Sparse Pseudo GPs /

Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,

Gaussian Process Networks Nir Friedman and Iftach Nachman UAI-2K.

Hierarchical Beta Process and the Indian Buffet Process by R. Thibaux and M. I. Jordan Discussion led by Qi An.

Hierarchical Mixture of Experts Presented by Qi An Machine learning reading group Duke University 07/15/2005.

Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.

A Method to Approximate the Bayesian Posterior Distribution in Singular Learning Machines Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.

A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,

Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.

Probability Theory and Parameter Estimation I

Variational Bayes Model Selection for Mixture Distribution

CS 2750: Machine Learning Density Estimation

Ch3: Model Building through Regression

Classification of unlabeled data:

Department of Civil and Environmental Engineering

Special Topics In Scientific Computing

Distributions and Concepts in Probability Theory

Collapsed Variational Dirichlet Process Mixture Models

SMEM Algorithm for Mixture Models

Stochastic Optimization Maximization for Latent Variable Models

Pattern Recognition and Machine Learning

Generally Discriminant Analysis

Topic Models in Text Processing

Parametric Methods Berlin Chen, 2005 References:

Multivariate Methods Berlin Chen

Multivariate Methods Berlin Chen, 2005 References:

Pattern Recognition and Machine Learning Chapter 2: Probability Distributions July chonbuk national university.

Presentation transcript:

High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture model Based on Minimum Message Length by Nizar Bouguila and Djemel Ziou Dissusion led by Qi An Duke University Machine Learning Group

Outline Introduction The generalized Dirichlet mixture The minimal message length (MML) criterion Fisher information matrix and priors Density estimation and model selection Experimental results Conclusions

Introduction How to determine the number of components in a mixture model for high-dimensional data? –Stochastic and resampling (Slow) Implementation of model selection criteria Fully Bayesian way –Deterministic (Fast) Approximate Bayesian criteria Information/coding theory concepts –Minimal message length (MML) –Akaike’s information criterion (AIC)

The generalized Dirichlet distribution A d dimensional generalized Dirichlet distribution is defined to be It can be reduced to the Dirichlet distribuiton when where and,,,

The generalized Dirichlet distribution For the generalized Dirichlet distribution: The GDD has a more general covariance structure than the DD and it is conjugate to multinomial distribution.

GDD vs. Gaussian The GDD has smaller number of parameters to estimate. The estimation can be more accurate The GDD is defined in a support [0,1] and can be extended to a compact support [A,B]. It is more appropriate for the nature of data. Beta distribution: Beta type-II distribution: They are equal if we set u=v/(1+v).

A GDD mixture model A generalized Dirichlet mixture model with M components, where p(X|α) takes a form of the GDD.

The MML criterion The message length is defined as minus the logarithm of the posterior probability. After placing an explicit prior over parameters, the message length for a mixture of distribution is given as priorlikelihoodFisher Information optimal quantization constant

Fisher Information matrix The Fisher information matrix is the expected value of the Hessian minus the logarithm of the likelihood where

Prior distribution Assume the independence between difference components Mixture weighs GDD parameters Place a Dirichlet distribution and a generalized Dirichlet distribution on P and α, respectively, with parameters set to 1.

Message length After obtaining the Fisher information and specifying the prior distribution, the message length can be expressed as

Estimation and selection algorithm The authors use an EM algorithm to estimate the mixture parameters. To overcome the computation issue and local maxima problem, they implement a fairly sophisticated initialization algorithm. The whole algorithm is summarized in the next page

Experimental results The correct number of mixture are 5, 6, 7, respectively

Experimental results

Web mining: –Training with multiple classes of labels –Use to predict the label of testing sample –Use top 200 words frequency

Conclusions A MML-based criterion is proposed to select the number of components in generalized Dirichlet mixtures. Full dimensionality of the data is used. Generalized Dirichlet mixtures allow more modeling flexibility than mixture of Gaussians. The results indicate clearly that the MML and LEC model selection methods outperform the other methods.