Acoustic Vector Re-sampling for GMMSVM-Based Speaker Verification

Slides:

Advertisements

Similar presentations

Known Non-targets for PLDA-SVM Training/Scoring Construction of Discriminative Kernels from Known and Unknown Non-targets for PLDA-SVM Scoring Results.

Advertisements

CRICOS No J † CSIRO ICT Centre * Speech, Audio, Image and Video Research Laboratory Audio-visual speaker verification using continuous fused HMMs.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

© Fraunhofer FKIE Corinna Harwardt Automatic Speaker Recognition in Military Environment.

Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),

Fusion of HMM’s Likelihood and Viterbi Path for On-line Signature Verification Bao Ly Van - Sonia Garcia Salicetti - Bernadette Dorizzi Institut National.

Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.

A Text-Independent Speaker Recognition System

Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.

Robust Voice Activity Detection for Interview Speech in NIST Speaker Recognition Evaluation Man-Wai MAK and Hon-Bill YU The Hong Kong Polytechnic University.

Page 0 of 8 Time Series Classification – phoneme recognition in reconstructed phase space Sanjay Patil Intelligent Electronics Systems Human and Systems.

Speaker Adaptation for Vowel Classification

9/20/2004Speech Group Lunch Talk Speaker ID Smorgasbord or How I spent My Summer at ICSI Kofi A. Boakye International Computer Science Institute.

EE225D Final Project Text-Constrained Speaker Recognition Using Hidden Markov Models Kofi A. Boakye EE225D Final Project.

SNR-Dependent Mixture of PLDA for Noise Robust Speaker Verification

ICASSP'06 1 S. Y. Kung 1 and M. W. Mak 2 1 Dept. of Electrical Engineering, Princeton University 2 Dept. of Electronic and Information Engineering, The.

Speech Technology Lab Ƅ ɜ: m ɪ ŋ ǝ m EEM4R Spoken Language Processing - Introduction Training HMMs Version 4: February 2005.

Tous droits réservés © 2005 CRIM The CRIM Systems for the NIST 2008 SRE Patrick Kenny, Najim Dehak and Pierre Ouellet Centre de recherche informatique.

Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos

Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

9.0 Speaker Variabilities: Adaption and Recognition References: of Huang 2. “ Maximum A Posteriori Estimation for Multivariate Gaussian Mixture.

EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

Recent work on Language Identification

Outline Separating Hyperplanes – Separable Case

This week: overview on pattern recognition (related to machine learning)

Institute of Information Science, Academia Sinica, Taiwan Speaker Verification via Kernel Methods Speaker : Yi-Hsiang Chao Advisor : Hsin-Min Wang.

A Talking Elevator, WS2006 UdS, Speaker Recognition 1.

VBS Documentation and Implementation The full standard initiative is located at Quick description Standard manual.

Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.

1 Phoneme and Sub-phoneme T- Normalization for Text-Dependent Speaker Recognition Doroteo T. Toledano 1, Cristina Esteve-Elizalde 1, Joaquin Gonzalez-Rodriguez.

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

Research & development Component Score Weighting for GMM based Text-Independent Speaker Verification Liang Lu SNLP Unit, France Telecom R&D Beijing

Evaluation of Speaker Recognition Algorithms. Speaker Recognition Speech Recognition and Speaker Recognition speaker recognition performance is dependent.

Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.

10/18/ Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.

TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.

Signature with Text-Dependent and Text-Independent Speech for Robust Identity Verification B. Ly-Van*, R. Blouet**, S. Renouard** S. Garcia-Salicetti*,

Signature with Text-Dependent and Text-Independent Speech for Robust Identity Verification B. Ly-Van*, R. Blouet**, S. Renouard** S. Garcia-Salicetti*,

ICASSP Speech Discrimination Based on Multiscale Spectro–Temporal Modulations Nima Mesgarani, Shihab Shamma, University of Maryland Malcolm Slaney.

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Using Support Vector Machines to Enhance the Performance of Bayesian Face Recognition IEEE Transaction on Information Forensics and Security Zhifeng Li,

An Intro to Speaker Recognition

1 Bioinformatic Voice Applications: Speaker Recognition and Verification Andrew Rosenberg Biometric Seminar Day August 23, 2010.

Speaker Verification Speaker verification uses voice as a biometric to determine the authenticity of a user. Speaker verification systems consist of two.

A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.

Nick Wang, 25 Oct Speaker identification and verification using EigenVoices O. Thyes, R. Kuhn, P. Nguyen, and J.-C. Junqua in ICSLP2000 Presented.

Singer similarity / identification Francois Thibault MUMT 614B McGill University.

July Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme.

BCS547 Neural Decoding.

Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,

Speaker Verification Using Adapted GMM Presented by CWJ 2000/8/16.

Statistical Models for Automatic Speech Recognition Lukáš Burget.

SNR-Invariant PLDA Modeling for Robust Speaker Verification Na Li and Man-Wai Mak Department of Electronic and Information Engineering The Hong Kong Polytechnic.

Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan

2009 NIST Language Recognition Systems Yan SONG, Bing Xu, Qiang FU, Yanhua LONG, Wenhui LEI, Yin XU, Haibing ZHONG, Lirong DAI USTC-iFlytek Speech Group.

Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.

A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.

Study on Deep Learning in Speaker Recognition Lantian Li CSLT / RIIT Tsinghua University May 26, 2016.

Research on Machine Learning and Deep Learning

Recognition using Nearest Neighbor (or kNN)

Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan

3. Applications to Speaker Verification

Statistical Models for Automatic Speech Recognition

Decision Making Based on Cohort Scores for

EE513 Audio Signals and Systems

Introduction to Digital Speech Processing

SNR-Invariant PLDA Modeling for Robust Speaker Verification

Presentation transcript:

Acoustic Vector Re-sampling for GMMSVM-Based Speaker Verification Good morning, Today I am here to give a presentation on “1” . Man-Wai MAK and Wei RAO The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/~mwmak/

Outline GMM-UBM for Speaker Verification GMM-SVM for Speaker Verification Data-Imbalance Problem in GMM-SVM Utterance Partitioning for GMM-SVM Experiments on NIST SRE Let’s look at the outline of the presentation. Firstly, I will introduce the background of the topic Second, I will explain the detail of Utterance Partition with Acoustic Vector Re-sampling Third, Briefly introduce the experiment environment Finally,

Speaker Verification To verify the identify of a claimant based on his/her own voices Is this Mary’s voice? I am Mary

Score Normalization and Decision Making Verification Process I’m John Decision Threshold John’s “Voiceprint” John’s Model + Score Normalization and Decision Making Feature Extraction Scores Impostor Model _ Impostors “Voiceprints” Accept/Reject

Acoustic Features Speech is a continuous evolution of the vocal tract Need to extract a sequence of spectra or sequence of spectral coefficients Use a sliding window - 25 ms window, 10 ms shift MFCC DCT Log|X(ω)|

GMM-UBM for Speaker Verification The acoustic vectors (MFCC) of speaker s is modeled by a prob. density function parameterized by Gaussian mixture model (GMM) for speaker s:

GMM-UBM for Speaker Verification The acoustic vectors of a general population is modeled by another GMM called the universal background model (UBM): Parameters of the UBM

GMM-UBM for Speaker Verification Enrollment Utterance (X(s)) of Client Speaker MAP Universal Background Model Client Speaker Model

GMM-UBM Scoring 2-class Hypothesis problem: H0: MFCC sequence X(c) comes from to the true speaker H1: MFCC sequence X(c) comes from an impostor Verification score is a likelihood ratio: Speaker Model Score Feature extraction + Decision − Background Model

Outline GMM-UBM for Speaker Verification GMM-SVM for Speaker Verification Data-Imbalance Problem in GMM-SVM Acoustic Vector Resampling for GMM-SVM Results on NIST SRE Let’s look at the outline of the presentation. Firstly, I will introduce the background of the topic Second, I will explain the detail of Utterance Partition with Acoustic Vector Re-sampling Third, Briefly introduce the experiment environment Finally,

GMM-SVM for Speaker Verification supervector UBM Feature Extraction MAP Adaptation Mean Stacking Mapping

GMM-SVM Scoring SVM Scoring … Compute GMM- Feature Supervector of Target Speaker s Feature Extraction UBM Compute GMM- Supervectors of Background Speakers Feature Extraction … Feature Extraction Compute GMM- Supervector of Claimant c UBM

GMM-UBM Scoring Vs. GMM-SVM Scoring Normalized GMM-supervector of claimant’s utterance Normalized GMM-supervector of target-speaker’s utterance

Outline GMM-UBM for Speaker Verification GMM-SVM for Speaker Verification Data-Imbalance Problem in GMM-SVM Utterance Partitioning for GMM-SVM Results on NIST SRE Let’s look at the outline of the presentation. Firstly, I will introduce the background of the topic Second, I will explain the detail of Utterance Partition with Acoustic Vector Re-sampling Third, Briefly introduce the experiment environment Finally,

Data Imbalance in GMM-SVM For each target speaker, we only have one utterance (GMM-supervector) from the target speaker and many utterances from the background speakers. So, we have a highly imbalance learning problem. Only one training vector from the target speaker

Data Imbalance in GMM-SVM Orientation of the decision boundary depends mainly on impostor-class data

Data Imbalance in GMM-SVM Impostor Class Speaker Class Region for which the target-speaker vector can be located without changing the orientation of the decision plane The green region beneath of decision plane in the supervector space where the speaker-class supervector can move around without affecting the orientation of the decision plane. A 3-dim two-class problem illustrating the problem that the SVM decision plane is largely governed by the impostor-class supervectors.

Outline GMM-UBM for Speaker Verification GMM-SVM for Speaker Verification Data-Imbalance Problem in GMM-SVM Utterance Partitioning for GMM-SVM Results on NIST SRE Let’s look at the outline of the presentation. Firstly, I will introduce the background of the topic Second, I will explain the detail of Utterance Partition with Acoustic Vector Re-sampling Third, Briefly introduce the experiment environment Finally,

Utterance Partitioning Partition an enrollment utterance of a target speaker into number of sub-utterances, with each sub-utterance producing one GMM-supervector. In the last chapter, we know that the situation in GMMSVM-based speaker verification is very special. The over-sampling method is not appropriate. In this case, we proposed Utterance Partitioning which partition an enrollment utterance of target speaker into number of sub-utterances, with each sub-utterance producing one GMM-supervector.

Utterance Partitioning Target-speaker’s Enrollment Utterance Background-speakers’ Utterances Feature Extraction Feature Extraction UBM MAP Adaptation and Mean Stacking This slide illustrate the procedure of Utterance Partitioning. After feature extraction, we get the sequence of acoustic vector of target speaker and then partition it into four segments. Through MAP adaptation and mean stacking, we generate five GMM-supervectors. Because matching the duration of target-speaker utterances with that of background utterances has been found useful in previous studies. The same partitioning strategy is also applied to background utterances and then get 5*B GMM-supervectors of background speakers which are used for SVM training to create the SVM of target speaker. SVM Training SVM of Target Speaker s

Length-Representation Trade-off When the number of partitions increases, the length of sub-utterance decreases. If the utterance-length is too short, the supervectors of the sub-utterances will be almost the same as that of the UBM For increasing the influence of target speaker, it need more sub-utterances. In other words, it need increase the number of segments, which may reduce the length of the sub-utterances. This will inevitable compromise the representation power of the sub-utterances, which also effect the representation power of GMM-supervectors. For solving this problem, we propose to address this issue by randomizing the sequence order before partitioning takes place. This randomization and partitioning process can be repeated several times to produce a desirable number of GMM-supervectors. We named this method as “UP-AVR”. Supervector corresponding to the UBM

Utterance Partitioning with Acoustic Vector Resampling (UP-AVR) Goal: Increase the number of sub-utterances without compromising their representation power Procedure of UP-AVR: 1. Randomly rearrange the sequence of acoustic vectors in an utterance; 2. Partition the acoustic vectors of an utterance into N segments; 3. If Step 1 and Step 2 are repeated R times, we obtain RN+1 target-speaker’s supervectors . For generating more sub-utterances with reasonable length is to use the notion of random re-sampling in bootstrapping. The idea is based on the fact that the MAP adaptation algorithm use the statistics of the whole utterance to update the GMM parameters. In other words, changing the order of acoustic vector will not affect the resulting MAP-adapted model. MFCC seq. before randomization MFCC seq. after randomization

Utterance Partitioning with Acoustic Vector Resampling (UP-AVR) Target - speaker ’ s Enrollment U tterance Background - speaker s ’ U tterances Feature Extraction and Feature Extraction and Index Randomization Index Randomization (s) X ) (b 1 X (s) utt ) ( 1 utt b (s) 1 X (s) 2 X (s) 3 X (s) 4 X ) (b 1 X ) (b 2 1 X ) (b 3 1 X ) (b 4 1 X (s) 4 , X K ) (b 2 X MAP Adaptation ) ( 2 utt b UBM and ) (b 1 2 X ) (b 2 X ) (b 3 2 X ) (b 4 2 X Mean Stacking This slide also show the procedure of UP-AVR. We can find that it is similar to the procedure of UP. The only different step is adding the index randomization. Using UP-AVR, we can generate more sub-utterances with reasonable length. ) ( 4 , 1 B b s m r L SVM Training ) (b B X ) ( utt B b ) (b 1 B X ) (b 2 B X ) (b 3 B X ) (b 4 B X SVM of Target Speaker s

Utterance Partitioning with Acoustic Vector Resampling (UP-AVR) Characteristics of supervectors created by UP-AVR Average pairwise distance between sub-utt SVs is larger than the average pairwise distance between sub-utt SVs and full-utt SV. Average pairwise distance between speaker-class’s sub-utt SVs and impostor-class’s SVs is smaller than the average pairwise distance between speaker-class’s full-utt SV and impostor-class’s SVs. Imposter-class Speaker-class This slide also show the procedure of UP-AVR. We can find that it is similar to the procedure of UP. The only different step is adding the index randomization. Using UP-AVR, we can generate more sub-utterances with reasonable length. Sub-utt supervector Full-utt supervector

Nuisance Attribute Projection Nuisance Attribute Project (NAP) [Solomonoff et al., ICASSP2005] Goal: To reduce the effect of session variability Recall the GMM-supervector kernel: Define the session- and speaker-dependent supervector as Remove the session-dependent part (h) by removing the sub-space that causes the session variability: Sub-space representing session variability. Defined by V The New kernel becomes The table summarizes the roles played by these corpora in the evaluations. NIST’02 and NIST’04 were used for performance evaluations. When the evaluation database is NIST’02, we use the data of NIST’01 to create UBMs, T-norm Models and Impostor-class of SVMs and calculate the NAP matrices. When the evaluation database is NIST’04, we use the data of Fisher to create UBMs, T-norm Models and Impostor-class of SVMs and NIST’99 and NIST’00 to calculate the NAP matrices.

Nuisance Attribute Projection Nuisance Attribute Project (NAP) [Solomonoff et al., ICASSP2005] Sub-space representing session variability. Defined by V The table summarizes the roles played by these corpora in the evaluations. NIST’02 and NIST’04 were used for performance evaluations. When the evaluation database is NIST’02, we use the data of NIST’01 to create UBMs, T-norm Models and Impostor-class of SVMs and calculate the NAP matrices. When the evaluation database is NIST’04, we use the data of Fisher to create UBMs, T-norm Models and Impostor-class of SVMs and NIST’99 and NIST’00 to calculate the NAP matrices.

Enrollment Process of GMM-SVM with UP-AVR Resampling/ Partitioning MFCCs of an utterance from target-speaker s UBM MAP and Mean Stacking Session-dependent supervectors NAP This slide also show the procedure of UP-AVR. We can find that it is similar to the procedure of UP. The only different step is adding the index randomization. Using UP-AVR, we can generate more sub-utterances with reasonable length. Session-independent supervectors SVM Training SVM of target-speaker s

Verification Process of GMM-SVM with UP-AVR MFCCs of a test utterance from claimant c UBM MAP and Mean Stacking Session-dependent supervector Tnorm Models NAP Session-independent supervector This slide also show the procedure of UP-AVR. We can find that it is similar to the procedure of UP. The only different step is adding the index randomization. Using UP-AVR, we can generate more sub-utterances with reasonable length. score SVM Scoring T-Norm Normalized score SVM of target-speaker s

T-Norm (Auckenthaler, 2000) Goal: To shift and scale the verification scores so that a global decision threshold can be used for all speakers T-Norm SVM 1 SVM Scoring Compute Mean and Standard Deviation This slide also show the procedure of UP-AVR. We can find that it is similar to the procedure of UP. The only different step is adding the index randomization. Using UP-AVR, we can generate more sub-utterances with reasonable length. Z-norm from test utterance SVM Scoring T-Norm SVM R

Outline GMM-UBM for Speaker Verification GMM-SVM for Speaker Verification Data-Imbalance Problem in GMM-SVM Utterance Partitioning for GMM-SVM Experiments on NIST SRE Let’s look at the outline of the presentation. Firstly, I will introduce the background of the topic Second, I will explain the detail of Utterance Partition with Acoustic Vector Re-sampling Third, Briefly introduce the experiment environment Finally,

Experiments Speech Data Evaluations on NIST SRE 2002 and 2004 Use NIST’01 for computing the UBMs, impostor-class supervectors of SVMs, Tnorm models, and NAP parameters 2983 true-speaker trials and 36287 impostor attempts 2-min utterances for training and about 1-min utt for test NIST SRE 2004: Use the Fisher corpus for computing UBMs, impostor-class supervectors of SVMs, and Tnorm models NIST’99 and NIST’00 for computing NAP parameters 2386 true-speaker trials and 23838 impostor attempts 5-min utterances for training and testing The table summarizes the roles played by these corpora in the evaluations. NIST’02 and NIST’04 were used for performance evaluations. When the evaluation database is NIST’02, we use the data of NIST’01 to create UBMs, T-norm Models and Impostor-class of SVMs and calculate the NAP matrices. When the evaluation database is NIST’04, we use the data of Fisher to create UBMs, T-norm Models and Impostor-class of SVMs and NIST’99 and NIST’00 to calculate the NAP matrices.

Experiments Features and Models 12 MFCC + 12 ΔMFCC with feature warping 1024-mixture GMMs for GMM-UBM 256-mixture GMMs for GMM-SVM MAP relevance factor = 16 300 impostor-class supervectors for GMM-SVM 200 T-norm models 64-dim session variability subspace (NAP corank, rank of V) The table summarizes the roles played by these corpora in the evaluations. NIST’02 and NIST’04 were used for performance evaluations. When the evaluation database is NIST’02, we use the data of NIST’01 to create UBMs, T-norm Models and Impostor-class of SVMs and calculate the NAP matrices. When the evaluation database is NIST’04, we use the data of Fisher to create UBMs, T-norm Models and Impostor-class of SVMs and NIST’99 and NIST’00 to calculate the NAP matrices.

Results No. of mixtures in GMM-SVM (NIST’02) Threshold below which the variances of feature are deemed too small The table summarizes the roles played by these corpora in the evaluations. NIST’02 and NIST’04 were used for performance evaluations. When the evaluation database is NIST’02, we use the data of NIST’01 to create UBMs, T-norm Models and Impostor-class of SVMs and calculate the NAP matrices. When the evaluation database is NIST’04, we use the data of Fisher to create UBMs, T-norm Models and Impostor-class of SVMs and NIST’99 and NIST’00 to calculate the NAP matrices. Normalized Large number of features with small variance

Results Effects of NAP on Different NIST SRE Large eigenvalues mean large session variation The table summarizes the roles played by these corpora in the evaluations. NIST’02 and NIST’04 were used for performance evaluations. When the evaluation database is NIST’02, we use the data of NIST’01 to create UBMs, T-norm Models and Impostor-class of SVMs and calculate the NAP matrices. When the evaluation database is NIST’04, we use the data of Fisher to create UBMs, T-norm Models and Impostor-class of SVMs and NIST’99 and NIST’00 to calculate the NAP matrices.

Effect of NAP Corank on Performance Results Effect of NAP Corank on Performance The table summarizes the roles played by these corpora in the evaluations. NIST’02 and NIST’04 were used for performance evaluations. When the evaluation database is NIST’02, we use the data of NIST’01 to create UBMs, T-norm Models and Impostor-class of SVMs and calculate the NAP matrices. When the evaluation database is NIST’04, we use the data of Fisher to create UBMs, T-norm Models and Impostor-class of SVMs and NIST’99 and NIST’00 to calculate the NAP matrices. No NAP

Comparing discriminative power of GMM-SVM and GMM-SVM with UP-AVR Results Comparing discriminative power of GMM-SVM and GMM-SVM with UP-AVR The table summarizes the roles played by these corpora in the evaluations. NIST’02 and NIST’04 were used for performance evaluations. When the evaluation database is NIST’02, we use the data of NIST’01 to create UBMs, T-norm Models and Impostor-class of SVMs and calculate the NAP matrices. When the evaluation database is NIST’04, we use the data of Fisher to create UBMs, T-norm Models and Impostor-class of SVMs and NIST’99 and NIST’00 to calculate the NAP matrices. Fig.4: Scores produced by SVMs that use one or more speaker-class supervectors (SVs) and 250 background SVs for training. The horizontal axis represents the training/testing SVs. Values inside the squared brackets are the mean difference between speaker scores and impostor scores. Fig.4: Scores produced by SVMs that use one or more speaker-class supervectors (SVs) and 250 background SVs for training. The horizontal axis represents the training/testing SVs. Values inside the squared brackets are the mean difference between speaker scores and impostor scores.

EER and MinDCF vs. No. of Target-Speaker Supervectors Results EER and MinDCF vs. No. of Target-Speaker Supervectors The figure a and b also show the trends of EER and minimum DCF when the number of speaker-class supervector increases. The figures demonstrate that utterance partitioning can reduce EER and minimum DCF. More importantly, the most significant performance gain is obtained when the number of speaker-class supervectors increases from 1 to 5. the performance levels off when more supervectors are added by increasing the number of resampling. This is reasonable because a large number of positive supervectors will only result in a large number of zero lagrange multipliers for the speaker class. NIST’02

Varying the number of resampling (R) and number of partitions (N) Results Varying the number of resampling (R) and number of partitions (N) The figure a and b also show the trends of EER and minimum DCF when the number of speaker-class supervector increases. The figures demonstrate that utterance partitioning can reduce EER and minimum DCF. More importantly, the most significant performance gain is obtained when the number of speaker-class supervectors increases from 1 to 5. the performance levels off when more supervectors are added by increasing the number of resampling. This is reasonable because a large number of positive supervectors will only result in a large number of zero lagrange multipliers for the speaker class. NIST’02

Results Table1: NIST’04 Table1: NIST’04 Table1: NIST’04

Experiments and Results Performance on NIST’02 EER=9.39% EER=9.05% EER=8.16%

Experiments and Results Performance on NIST’04 GMM-UBM EER=16.05% GMM-SVM GMM-SVM w/ UP-AVR EER=10.42% EER=9.46%

References S.X. Zhang and M.W. Mak "Optimized Discriminative Kernel for SVM Scoring and its Application to Speaker Verification", IEEE Trans. on Neural Networks, to appear. M.W. Mak and W. Rao, "Utterance Partitioning with Acoustic Vector Resampling for GMM-SVM Speaker Verification", Speech Communication, vol. 53 (1), Jan. 2011, Pages 119-130. M.W. Mak and W. Rao, "Acoustic Vector Resampling for GMMSVM-Based Speaker Verification, Interspeech 2010. Sept. 2010, Makuhari, Japan, pp. 1449-1452. S.Y. Kung, M.W. Mak, and S.H. Lin. Biometric Authentication: A Machine Learning Approach, Prentice Hall, 2005 W. M. Campbell, D. E. Sturim, and D. A. Reynolds, “Support vector machines using GMM supervectors for speaker verification,” IEEE Signal Processing Letters, vol. 13, pp. 308–311, 2006. D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, “Speaker verification using adapted Gaussian mixture models,” Digital Signal Processing, vol. 10, pp. 19–41, 2000.