Variational Bayesian Methods for Audio Indexing

Slides:



Advertisements
Similar presentations
CS479/679 Pattern Recognition Dr. George Bebis
Advertisements

Expectation Maximization
MEG/EEG Inverse problem and solutions In a Bayesian Framework EEG/MEG SPM course, Bruxelles, 2011 Jérémie Mattout Lyon Neuroscience Research Centre ? ?
Shinichi Nakajima Sumio Watanabe  Tokyo Institute of Technology
Supervised Learning Recap
Hidden Markov Model based 2D Shape Classification Ninad Thakoor 1 and Jean Gao 2 1 Electrical Engineering, University of Texas at Arlington, TX-76013,
Visual Recognition Tutorial
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
. Learning Bayesian networks Slides by Nir Friedman.
Lecture 5: Learning models using EM
Most slides from Expectation Maximization (EM) Northwestern University EECS 395/495 Special Topics in Machine Learning.
Gaussian Mixture Example: Start After First Iteration.
Expectation Maximization for GMM Comp344 Tutorial Kai Zhang.
Maximum Likelihood (ML), Expectation Maximization (EM)
Expectation-Maximization
Visual Recognition Tutorial
Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9
. Learning Bayesian networks Most Slides by Nir Friedman Some by Dan Geiger.
Speech Technology Lab Ƅ ɜ: m ɪ ŋ ǝ m EEM4R Spoken Language Processing - Introduction Training HMMs Version 4: February 2005.
Gaussian Mixture Models and Expectation Maximization.
Crash Course on Machine Learning
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Overview of NIT HMM-based speech synthesis system for Blizzard Challenge 2011 Kei Hashimoto, Shinji Takaki, Keiichiro Oura, and Keiichi Tokuda Nagoya.
Isolated-Word Speech Recognition Using Hidden Markov Models
Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Lecture 19: More EM Machine Learning April 15, 2010.
Hyperparameter Estimation for Speech Recognition Based on Variational Bayesian Approach Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee and Keiichi.
High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture model Based on Minimum Message Length by Nizar Bouguila.
Using Inactivity to Detect Unusual behavior Presenter : Siang Wang Advisor : Dr. Yen - Ting Chen Date : Motion and video Computing, WMVC.
Bayesian Parameter Estimation Liad Serruya. Agenda Introduction Bayesian decision theory Scale-Invariant Learning Bayesian “One-Shot” Learning.
Randomized Algorithms for Bayesian Hierarchical Clustering
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
HMM - Part 2 The EM algorithm Continuous density HMM.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
CS Statistical Machine learning Lecture 24
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Lecture 2: Statistical learning primer for biologists
Flat clustering approaches
CSE 517 Natural Language Processing Winter 2015
Information Bottleneck versus Maximum Likelihood Felix Polyakov.
1 Parameter Estimation Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and Multimedia,
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
The Uniform Prior and the Laplace Correction Supplemental Material not on exam.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
A Method to Approximate the Bayesian Posterior Distribution in Singular Learning Machines Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
. The EM algorithm Lecture #11 Acknowledgement: Some slides of this lecture are due to Nir Friedman.
16.0 Some Fundamental Principles – EM Algorithm References: , of Huang, or of Jelinek of Rabiner and Juang 3.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Variational Bayes Model Selection for Mixture Distribution
Statistical Models for Automatic Speech Recognition
Irina Rish IBM T.J.Watson Research Center
Latent Variables, Mixture Models and EM
Expectation-Maximization
CSCI 5822 Probabilistic Models of Human and Machine Learning
Akio Utsugi National Institute of Bioscience and Human-technology,
Bayesian Models in Machine Learning
Statistical Models for Automatic Speech Recognition
Stochastic Optimization Maximization for Latent Variable Models
Decision Making Based on Cohort Scores for
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Qiang Huo(*) and Chorkin Chan(**)
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

Variational Bayesian Methods for Audio Indexing Fabio Valente, Christian Wellekens Institut Eurecom

Outline Generalities on speaker clustering Model selection/BIC Variational learning Variational model selection Results

Speaker clustering Many applications (speaker indexing, speech recognition) require clustering segments with the same characteristics e.g. speech from the same speaker. Goal: grouping together speech segments of the same speaker Fully connected (ergodic) HMM topology with duration constraint. Each state represent a speaker. When speaker number is not known it must be estimated with a model selection criterion (e.g. BIC,…)

Model selection Given data Y and model m optimal model maximizes: If prior is uniform, decision depends only on p(Y|m) (a.k.a. marginal likelihood) Bayesian modeling assumes distributions over parameters The criterion is thus the marginal likelihood: Prohibitive to compute for some models (HMM,GMM)

Bayesian information criterion (BIC) First order approximation obtained from the Laplace approximation of the marginal likelihood (Schwartz, 1978) Generally, penalty is multiplied by a constant (threshold): BIC does not depend on parameter distributions ! Asymptotically (n large) BIC converges to log-marginal likelihood

Variational Learning Introduce an approximated variational distribution Applying Jensen inequality ln p(Y|m) maximization is then replaced by maximization of

Variational Learning with hidden variables Sometimes model optimization needs the use of hidden variables (e.g. state sequence in the EM) If x is the hidden variable, we can write: Independence hypothesis

EM-like algorithm Under the hypothesis: E-step: M-step:

VB Model selection In the same way an approximated posterior distribution over models can be defined: Maximizing w.r.t. q(m) yields: Model selection based on Best model maximizes q(m)

Experimental framework BN-96 Hub4 evaluation data set Initialize a model with N speakers (states) and train the system using VB and ML (or VB and MAP with UBM) Reduce the speaker number from N-1 to 1 and train using VB and ML (or MAP). Score the N models with VB and BIC and choose the best one Three score Best score Selected score (with VB or BIC) Score obtained with the known speaker number Results given in terms of : Acp: average cluster purity Asp: average speaker purity

Experiments I File 1 N acp asp K ML-known 8 0.60 0.84 0.71 ML-best 10 0.80 0.86 0.83 ML/BIC 13 File 1 N acp asp K VB-known 8 0.70 0.91 0.80 VB-best 12 0.85 0.89 0.87 VB 15 File 2 N acp asp K ML-known 14 0.76 0.67 0.72 ML-best 9 0.77 0.74 ML/BIC 13 0.84 0.63 0.73 File 2 N acp asp K VB-known 14 0.75 0.82 0.78 VB-best 0.84 0.81 VB

Experiments II File 3 N acp asp K ML-known 16 0.75 0.74 ML-best 15 0.77 0.83 0.80 ML/BIC File 3 N acp asp K VB-known 16 0.68 0.86 0.76 VB-best 14 0.75 0.90 0.82 VB File 4 N acp asp K ML-known 21 0.72 0.65 0.68 ML-best 12 0.63 0.80 0.71 ML/BIC 0.76 0.60 File 4 N acp asp K VB-known 21 0.72 0.65 0.68 VB-best 13 0.63 0.80 0.71 VB 0.64

Dependence on threshold K function of the threshold Speaker number function of the threshold

Free Energy vs. BIC

Experiments III File 1 N acp asp K 8 0.52 0.72 0.62 MAP-best 15 0.81 MAP-known 8 0.52 0.72 0.62 MAP-best 15 0.81 0.84 0.83 MAP/BIC 13 0.80 File 1 N acp asp K VB-known 8 0.68 0.88 0.77 VB-best 22 0.83 0.85 0.84 VB File 2 N acp asp K MAP-known 14 0.68 0.78 0.73 MAP-best 22 0.84 0.80 0.82 MAP/BIC 18 0.85 0.81 File 2 N acp asp K VB-known 14 0.69 0.80 0.74 VB-best 18 0.85 0.87 0.86 VB 19 0.83

Experiments IV File 3 N acp asp K 16 0.71 0.77 0.74 MAP-best 29 0.78 MAP-known 16 0.71 0.77 0.74 MAP-best 29 0.78 0.76 MAP/BIC 0.69 0.73 File 3 N acp asp K VB-known 16 0.74 0.83 0.78 VB-best 22 0.82 VB 0.79 File 4 N acp asp K MAP-known 18 0.65 0.69 0.67 MAP-best MAP/BIC 20 0.63 0.64 File 4 N acp asp K VB-known 21 0.67 0.73 0.70 VB-best 20 0.69 0.72 VB 19

Conclusions and Future Works VB uses free energy for parameter learning and model selection. VB generalizes both ML and MAP learning framework. VB outperforms ML/BIC on 3 of the 4 BN files. VB outperforms MAP/BIC on 4 of the 4 BN files. Repeat the experiments on other databases (e.g. NIST speaker diarization).

Thanks for your attention!

Data vs. Gaussian components Final gaussian components function of amount of data for each speaker

Experiments (file 1) Real VB ML/ BIC Speaker 8 15 13

Experiments (file 2) Real VB ML/ BIC Speaker 14 16

Experiments (file 3) Real VB ML/ BIC Speaker 16 14 15

Experiments (file 4) Real VB ML/ BIC Speaker 21 13 12