1 Robust HMM classification schemes for speaker recognition using integral decode Marie Roch Florida International University.

Slides:



Advertisements
Similar presentations
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Advertisements

Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
ECG Signal processing (2)
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
An Introduction of Support Vector Machine
Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.
An Overview of Machine Learning
Supervised Learning Recap
2004/11/161 A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition LAWRENCE R. RABINER, FELLOW, IEEE Presented by: Chi-Chun.
Hidden Markov Models Theory By Johan Walters (SR 2003)
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Pattern Recognition and Machine Learning
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Feature Selection, Acoustic Modeling and Adaptation SDSG REVIEW of recent WORK Technical University of Crete Speech Processing and Dialog Systems Group.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR.
HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski.
Isolated-Word Speech Recognition Using Hidden Markov Models
This week: overview on pattern recognition (related to machine learning)
Based on: The Nature of Statistical Learning Theory by V. Vapnick 2009 Presentation by John DiMona and some slides based on lectures given by Professor.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Basics of Neural Networks Neural Network Topologies.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
1 Parameter Estimation Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and Multimedia,
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
A Hybrid Model of HMM and RBFN Model of Speech Recognition 길이만, 김수연, 김성호, 원윤정, 윤아림 한국과학기술원 응용수학전공.
Feature Transformation and Normalization Present by Howard Reference : Springer Handbook of Speech Processing, 3.3 Environment Robustness (J. Droppo, A.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Chapter 3: Maximum-Likelihood Parameter Estimation
LECTURE 11: Advanced Discriminant Analysis
Statistical Models for Automatic Speech Recognition
Computational NeuroEngineering Lab
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Statistical Models for Automatic Speech Recognition
EE513 Audio Signals and Systems
LECTURE 15: REESTIMATION, EM AND MIXTURES
A maximum likelihood estimation and training on the fly approach
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Maximum Likelihood Estimation (MLE)
Presentation transcript:

1 Robust HMM classification schemes for speaker recognition using integral decode Marie Roch Florida International University

2 Who am I?

3 Speaker Recognition VerificationIdentification Text Dependent Text Independent Types of speaker recognition

4 Speaker Recognition Why is it hard? Minimal training data Background noise Transducer mismatch Channel distortions People’s voices change over time and under stress Performance

5 Feature Extraction Extract speech Spectral analysis Cepstrum: Cepstral means removal

6 Hidden Markov Models Statistical pattern recognition State dependent modeling –Distribution/state –Radial basis functions common State sequence unobservable

7 HMM Efficient decoders: Training –EM algorithm –Convergence to local maxima guaranteed

8 Recognition Model for each speaker Maximum a priori (MAP) decision rule Arg Max Features Models Scores

9 The MAP decision rule Optimal decision rule provided we have accurate distribution parameters & observations. Problem: –Corruption of feature vectors. –Distribution known to be inaccurate.

10 A case of mistaken identity

11 Integral decode Goal: Include uncorrupted observation ô t. Problem: ô t unobservable. Determine a local neighborhood  t about o t and use a priori information to weight the likelihood:

12 Integral decode issues Problems approximating the integral –High frame rate * number of models –Non-trivial dimensionality Selection of the neighborhood

13 Approximating the integral Monte Carlo impractical Use simplified cubature technique:

14 Neighborhood choice Choosing an appropriate neighborhood: –Upper bound difference neighborhoods [Merhav and Lee 93] –Error source modeling

15 Upper bound difference neighborhoods Arbitrary signal pairs with a few general conditions. PSD Cepstra

16 Taking the upper bound Asymptotic difference between cepstral parameters:

17 Error source modeling Multiple error sources Simplifying assumption of one normal distribution with zero mean Use time series analysis to estimate the noise Trend

18 Error Source Modeling Estimate variance from detrended signal

19 Error source modeling Problem: – is infinite Solution: –Most of the points are outliers –Set percentage of distribution beyond which points are culled.

20 Complexity of integration Expensive Ways to reduce/cope –Implemented Top K processing Principle Components Analysis –Possible Gaussian Selection Sub-band Models SIMD or MIMD parallelism

21 Top K Processing 1 second3 seconds 5 seconds

22 Principal Component Analysis Choose P most important directions

23 Principal Component Analysis Integrate using new basis set for step function

24 Speech Corpus King-92 –Used San Diego subset 26 male speakers Long distance telephone speech Quiet room environment 5 sessions recorded one week apart –1-3 train –Sessions 4-5 partitioned into test segments

25 Baseline performance

26 Integral decode performance 1 second3 seconds5 seconds

27 Integral decode with other conditions Performance on –high quality speech –transducer mismatch

28 Future work Extensions to the integral decode –Automatic parameter selection –Gaussian selection –distributed computation Efficient multiple class preclassifiers

29

30 Optimal/utterance hyperparameters – 5 seconds KingNB26KingWB51 SpidreF18XDR SpidreM27XDR

31 95% Confidence Intervals Caveat: –Per speaker means –Large granularity

32 Pattern Recognition Long term statistics [Bricker et al 71, Markel et al 77] Vector Quantization [Soong et al 87] HMM [Rosenberg et al 90, Tishby 91, Matsui & Furui 92, Reynolds et al 95] Connectionist frameworks Feed forward [Oglesby & Mason 90] Learning vector quantization [He et al 99]

33 Pattern Recognition Contd. Hybrid/Modified HMMs Min Classification Error discriminant [Liu et al 95] Tree structured neural classifiers [Liou & Mammone 95] Trajectory modeling [Russell et al 85, Liu et al 95, Ostendorf et al 96, He et al 99] Sub-band recognition [Besacier & Bonastre 97]