Speaker Verification Speaker verification uses voice as a biometric to determine the authenticity of a user. Speaker verification systems consist of two.

Slides:

Advertisements

Similar presentations

Applications of one-class classification

Advertisements

Introduction to Support Vector Machines (SVM)

Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.

ECG Signal processing (2)

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.

Pattern Recognition and Machine Learning

Support Vector Machines

SVM—Support Vector Machines

Support vector machine

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.

Visual Recognition Tutorial

Page 0 of 8 Time Series Classification – phoneme recognition in reconstructed phase space Sanjay Patil Intelligent Electronics Systems Human and Systems.

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

Speaker Adaptation for Vowel Classification

Reduced Support Vector Machine

Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.

Visual Recognition Tutorial

What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.

05/06/2005CSIS © M. Gibbons On Evaluating Open Biometric Identification Systems Spring 2005 Michael Gibbons School of Computer Science & Information Systems.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos

EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

Soft Margin Estimation for Speech Recognition Main Reference: Jinyu Li, " SOFT MARGIN ESTIMATION FOR AUTOMATIC SPEECH RECOGNITION," PhD thesis, Georgia.

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.

Outline Separating Hyperplanes – Separable Case

Institute of Information Science, Academia Sinica, Taiwan Speaker Verification via Kernel Methods Speaker : Yi-Hsiang Chao Advisor : Hsin-Min Wang.

Efficient Model Selection for Support Vector Machines

Principles of Pattern Recognition

Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.

Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者：郝柏翰 2013/01/28.

A Sparse Modeling Approach to Speech Recognition Based on Relevance Vector Machines Jon Hamaker and Joseph Picone Institute for.

10/18/ Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

1 E. Fatemizadeh Statistical Pattern Recognition.

A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.

Singer similarity / identification Francois Thibault MUMT 614B McGill University.

Linear Models for Classification

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.

Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.

Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

Speaker Verification Using Adapted GMM Presented by CWJ 2000/8/16.

Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.

Statistical Models for Automatic Speech Recognition Lukáš Burget.

Next, this study employed SVM to classify the emotion label for each EEG segment. The basic idea is to project input data onto a higher dimensional feature.

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.

Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.

Lecture 1.31 Criteria for optimal reception of radio signals.

CS 9633 Machine Learning Support Vector Machines

Statistical Models for Automatic Speech Recognition

LECTURE 16: SUPPORT VECTOR MACHINES

Pawan Lingras and Cory Butz

Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis

Statistical Models for Automatic Speech Recognition

EE513 Audio Signals and Systems

LECTURE 17: SUPPORT VECTOR MACHINES

Generally Discriminant Analysis

SVMs for Document Ranking

Presentation transcript:

SPEAKER VERIFICATION USING SUPPORT VECTOR MACHINES S. Raghavan, G. Lazarou and J. Picone Intelligent Electronic Systems Center for Advanced Vehicular Systems Mississippi State University URL: http://www.cavs.msstate.edu/hse/ies/publications/conferences/ieee_secon/2006/support_vectors/

Speaker Verification Speaker verification uses voice as a biometric to determine the authenticity of a user. Speaker verification systems consist of two essential operations: Enrollment: the system learns the speaker’s acoustic information and builds a speaker model or template. Verification: the claimed speaker is compared to the model and a likelihood score is computed. A threshold is used to discriminate true speaker from an impostor. This presentation focuses on the classifier portion of the speaker verification system.

Baseline System This is an example of a distribution of the fourth cepstral coefficient from one of the utterances taken from the NIST 2003 Speaker Recognition dataset. The distribution cannot be modeled by a single Gaussian. By using two Gaussians, we can achieve a more accurate representation of the model. The baseline system used a Gaussian Mixture Model (GMM) as the underlying classifier. The system used 39 dimensional Mel Frequency Cepstral Coefficients (MFCCs) as input features. Train Utterance Statistical Model Test Utterance Decision Compute Likelihood

Drawbacks of Using Gaussian Mixtures The probability of obtaining the correct hypothesis given the test data is a posterior probability, and it can be decomposed into a prior and conditional probability using the Bayes rule. The parameters for P(A/S) are estimated from the data using Maximum Likelihood Estimation (MLE). The parametric form of P(A/S) is generally assumed to be Gaussian. Since the acoustic data may not actually be Gaussian, we will end up with approximations that lead to classification errors.

Optimal Decision Boundary Maximum likelihood convergence does not translate to optimal classification if a priori assumptions about the data are not correct. At best one can approximate the distribution by using a mixture of Gaussians, but still the problem of finding the optimal decision boundary still remains. SVM does not make a Gaussian assumption. Instead it is a purely discriminative approach where the boundaries are directly learned from the data.

Support Vector Machines (SVMs) SVMs are binary classifiers that learn the decision region through discriminative training. SVMs transform data to a higher dimensional space using kernel functions. In the higher dimensional space, SVMs can construct a linear hyperplane to form the decision boundary, which in lower dimension would require a non-linear decision boundary. SVMs require labeled data for training. SVMs find an optimal decision boundary. class 1 class 2 w H1 H2 C1 CO C2 optimal classifier Hyperplanes C0-C2 achieve zero empirical risk. C0 generalizes optimally. The data points that define the boundary are called support vectors.

Using SVMs for Optimal Decision Boundary Hyperplane: Constraints: Quadratic optimization of a Lagrange functional minimizes risk criterion (maximizes margin). Only a small portion become support vectors. Final Classifier: LBG classifier- classification error rate 27.88% SVM classifier- classification error rate 21.15%

Using SVMs for Speaker Verification Speaker verification requires the system to make a binary decision. The decision boundary can be learned from labeled data. An SVM classifier is ideally suited for such a task. It requires two sets of labeled data: in-class and out-of-class. The impostor data can consist of data from all speakers excluding the speaker whose data is used as in-class. A classifier must be trained for each speaker. During verification the speaker’s utterance is matched with the claimed identity’s classifier ( ).

Block Diagram of SVM Based Speaker Verification TRAINING Using Structural Risk Minimization (SRM) A model is created for every speaker in the training data set in-class data Train the model SVM model out-class data Test data Distance from the hyperplane between -1 to 1 if (distance < threshold) {Reject Speaker} else {Accept Speaker} TESTING claimed model The input data contains 39-dimensional MFCCs. The in-class data corresponds to a particular speaker and the out-of-class data contains features of all other speakers. The SVM model contains the support vectors and the weights.

Experimental Design Database: NIST 2003. The development train and test set contained 60 and 78 utterances respectively. Each utterance was approximately 2 minutes long. Standard 39-dimensional MFCCs were used as features. The in-class data for training contained the entire feature set from the speaker utterance. The out-of-class data for training contained randomly picked features from all the remaining speakers in the training data set. Note: The SVM trainer was not exposed to all the data available for out-of-class. This was done in order to speed up the training process. During testing, the distance of every MFCC test vector from the hyperplane is computed and averaged. If this average is greater than zero, we accept the speaker, else we reject the speaker. A Detection Error Trade off curve was plotted, using the true speaker and impostor speaker distances. This was compared with the baseline GMM speaker verification system.

Analyzing the Effect of Training Parameters The parameters were varied based on the Detection Cost Function (DCF): CDet = [CMiss × P(Miss|Target) × P(Target)] + [CFalseAlarm× P(FalseAlarm|NonTarget) × (1-P(Target))] The NIST DET curve analysis program was used to obtain the DCF values. Two main parameters were analyzed: Penalty (C): This parameter accounts for the training errors. This value was varied from 10 – 100 and no significant change in performance was noted. Hence a mid value of 50 was chosen. Kernel parameter (gamma): For gamma values between 2.5 to 0.02 there was no change in the distance scores of the utterances in the test set. The performance was stable between 0.03 and 0.01. The best performance was observed when gamma was 0.019.

Effect of Penalty and Kernel Parameters DET curves for various values of the RBF kernel parameter Gamma Minimum DCF as a function of Gamma Gamma(C=50) Min DCF 0.010 0.2125 0.015 0.2168 0.019 0.1406 0.030 0.2305 Miss probability (in %) False Alarm probability (in %) The DET curve helps in selecting a decision threshold. The DET curve also helps in choosing an optimal operating region for the system.

SVM Vs GMM (Baseline System) A comparison of HMM and SVM performance The EER improved by 9% absolute. The Min-DCF improved by 33% relative. The results are promising since only a small portion of the entire data was used as out-of-class data. Also the system uses a very simple averaging strategy to make yes/no decision. A more intelligent approach would yield better results. Miss probability (in %) SVM Vs HMM Based Baseline System False Alarm probability (in %) HMM SVM EER 25% 16% Min DCF 0.2124 0.1406

Conclusions and Future Work The SVM system performed significantly better than the GMM baseline system. The Equal Error Rate improved by 9% absolute. The Min-DCF value improved by 33% relative. The effect of RBF kernel parameter was analyzed [… state the result! …] A speaker verification framework has been laid out. [… not a good conclusion!]… Future Work: Improving training efficiency of SVMs using subsampling strategy. Determine the support vectors from small subsets of the training data. Relevance Vector Machines (RVMs) for speaker verification. Use more robust features to train the classifier (features that capture the nonlinearities present in the speech signal). Research on a better technique to determine the score for a speaker.

Resources Pattern Recognition Applet: compare popular algorithms on standard or custom data sets Speech Recognition Toolkits: compare SVMs and RVMs to standard approaches using a state of the art ASR toolkit Foundation Classes: generic C++ implementations of many popular statistical modeling approaches

References J.P. Campbell, “Speaker Recognition: A Tutorial,” Proceedings of IEEE, pp. 1437-1462, September 1997. D. A. Reynolds, “Speaker identification and verification using Gaussian mixture speaker models,” Speech Commun, pp. 91–108, 1995. A. Ganapathiraju, “Support Vector Machines for Speech Recognition,” Ph.D. Dissertation, Department of Electrical and Computer Engineering, Mississippi State University, January 2002. W. M. Campbell, E. Singer, P. A. Torres-Carrasquillo, and D. A. Reynolds, “Language Recognition with Support Vector Machines,” Proc. Odyssey: The Speaker and Language Recognition Workshop, Toledo, Spain, ISCA, pp. 41-44, 31 June 2004. J. Picone, “Signal Modeling Techniques in Speech Recognition,” IEEE Proceedings, vol. 81, no. 9, pp. 1215-1247, September 1993. “NIST 2003 Speaker Recognition Evaluation Plan,” http://www.nist.gov/speech/tests/spk/2003/doc/2003-spkrec-evalplan-v2.2.pdf, March 2006. A. Martin, G. Doddington, M. Ordowski, and M. Przybocki, “The DET curve in assessment of detection task performance,”. In Proceedings of EuroSpeech, volume 4, pages 1895--1898, 1997.