SNR-Invariant PLDA Modeling for Robust Speaker Verification Na Li and Man-Wai Mak Department of Electronic and Information Engineering The Hong Kong Polytechnic.

Slides:



Advertisements
Similar presentations
Known Non-targets for PLDA-SVM Training/Scoring Construction of Discriminative Kernels from Known and Unknown Non-targets for PLDA-SVM Scoring Results.
Advertisements

1 Patrol LID System for DARPA RATS P1 Evaluation Pavel Matejka Patrol Team Language Identification System for DARPA RATS P1 Evaluation Pavel Matejka 1,
Results obtained in speaker recognition using Gaussian Mixture Models Marieta Gâta*, Gavril Toderean** *North University of Baia Mare **Technical University.
Acoustic Vector Re-sampling for GMMSVM-Based Speaker Verification
Computer vision: models, learning and inference Chapter 18 Models for style and identity.
Fusion of HMM’s Likelihood and Viterbi Path for On-line Signature Verification Bao Ly Van - Sonia Garcia Salicetti - Bernadette Dorizzi Institut National.
A Text-Independent Speaker Recognition System
Probabilistic Clustering-Projection Model for Discrete Data
Automatic Identification of Bacterial Types using Statistical Image Modeling Sigal Trattner, Dr. Hayit Greenspan, Prof. Shimon Abboud Department of Biomedical.
An Energy Search Approach to Variable Frame Rate Front-End Processing for Robust ASR Julien Epps and Eric H. C. Choi National ICT Australia Presenter:
Robust Voice Activity Detection for Interview Speech in NIST Speaker Recognition Evaluation Man-Wai MAK and Hon-Bill YU The Hong Kong Polytechnic University.
An Alternative Approach of Finding Competing Hypotheses for Better Minimum Classification Error Training Mr. Yik-Cheung Tam Dr. Brian Mak.
Zhiyao Duan, Gautham J. Mysore, Paris Smaragdis 1. EECS Department, Northwestern University 2. Advanced Technology Labs, Adobe Systems Inc. 3. University.
Speaker Adaptation for Vowel Classification
Clustering.
Language and Speaker Identification using Gaussian Mixture Model Prepare by Jacky Chau The Chinese University of Hong Kong 18th September, 2002.
Gaussian Mixture Example: Start After First Iteration.
What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group.
The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System.
SNR-Dependent Mixture of PLDA for Noise Robust Speaker Verification
Tous droits réservés © 2005 CRIM The CRIM Systems for the NIST 2008 SRE Patrick Kenny, Najim Dehak and Pierre Ouellet Centre de recherche informatique.
A Unifying Review of Linear Gaussian Models
Advisor: Prof. Tony Jebara
Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models C. J. Leggetter and P. C. Woodland Department of.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
Page 0 of 14 Dynamical Invariants of an Attractor and potential applications for speech data Saurabh Prasad Intelligent Electronic Systems Human and Systems.
Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.
VBS Documentation and Implementation The full standard initiative is located at Quick description Standard manual.
Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition Bing Zhang and Spyros Matsoukas BBN Technologies Present.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Research & development Component Score Weighting for GMM based Text-Independent Speaker Verification Liang Lu SNLP Unit, France Telecom R&D Beijing
Compensating speaker-to-microphone playback system for robust speech recognition So-Young Jeong and Soo-Young Lee Brain Science Research Center and Department.
1 Improved Speaker Adaptation Using Speaker Dependent Feature Projections Spyros Matsoukas and Richard Schwartz Sep. 5, 2003 Martigny, Switzerland.
Sound-Event Partitioning and Feature Normalization for Robust Sound-Event Detection 2 Department of Electronic and Information Engineering The Hong Kong.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Using Support Vector Machines to Enhance the Performance of Bayesian Face Recognition IEEE Transaction on Information Forensics and Security Zhifeng Li,
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
Nick Wang, 25 Oct Speaker identification and verification using EigenVoices O. Thyes, R. Kuhn, P. Nguyen, and J.-C. Junqua in ICSLP2000 Presented.
USE OF IMPROVED FEATURE VECTORS IN SPECTRAL SUBTRACTION METHOD Emrah Besci, Semih Ergin, M.Bilginer Gülmezoğlu, Atalay Barkana Osmangazi University, Electrical.
Tom Ko and Brian Mak The Hong Kong University of Science and Technology.
HMM - Part 2 The EM algorithm Continuous density HMM.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
A Semi-Blind Technique for MIMO Channel Matrix Estimation Aditya Jagannatham and Bhaskar D. Rao The proposed algorithm performs well compared to its training.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Paper Reading Dalong Du Nov.27, Papers Leon Gu and Takeo Kanade. A Generative Shape Regularization Model for Robust Face Alignment. ECCV08. Yan.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Variational Bayesian Methods for Audio Indexing
Gaussian Mixture Models and Expectation-Maximization Algorithm.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
An i-Vector PLDA based Gender Identification Approach for Severely Distorted and Multilingual DARPA RATS Data Shivesh Ranjan, Gang Liu and John H. L. Hansen.
Speaker Verification Using Adapted GMM Presented by CWJ 2000/8/16.
November 9th, 2010 Low Complexity EM-based Decoding for OFDM Systems with Impulsive Noise Asilomar Conference on Signals, Systems, and Computers
Modeling Annotated Data (SIGIR 2003) David M. Blei, Michael I. Jordan Univ. of California, Berkeley Presented by ChengXiang Zhai, July 10, 2003.
EM Algorithm 主講人:虞台文 大同大學資工所 智慧型多媒體研究室. Contents Introduction Example  Missing Data Example  Mixed Attributes Example  Mixture Main Body Mixture Model.
Madhulika Pannuri Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Correlation Dimension.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Experience Report: System Log Analysis for Anomaly Detection
Outlier Processing via L1-Principal Subspaces
Orthogonal Subspace Projection - Matched Filter
3. Applications to Speaker Verification
Learning with information of features
ECE539 final project Instructor: Yu Hen Hu Fall 2005
Lecture 14 PCA, pPCA, ICA.
SMEM Algorithm for Mixture Models
Stochastic Optimization Maximization for Latent Variable Models
Decision Making Based on Cohort Scores for
SNR-Invariant PLDA Modeling for Robust Speaker Verification
Presentation transcript:

SNR-Invariant PLDA Modeling for Robust Speaker Verification Na Li and Man-Wai Mak Department of Electronic and Information Engineering The Hong Kong Polytechnic University, Hong Kong SAR, China Interspeech 2015 Dresden, Germany

2 Contents 1.Background and Motivation of Work 2.SNR-invariant PLDA modeling for Robust Speaker Verification 3.Experiments on SRE12 4.Conclusions 2

3 Background I-vector/PLDA Framework m : Global mean of all i-vectors V : Bases of speaker subspace : Latent speaker factor with a standard normal distribution : Residual term follows a Gaussian distribution with zero mean and full covariance :Length-normalized i-vector of speaker i

4 Background In conventional multi-condition training, we pool i-vectors from various background noise levels to train m, V and Σ. EM Algorithm I-vectors with 2 SNR ranges

5 Motivation We argue that the variation caused by SNR can be modeled by an SNR subspace and utterances falling within a narrow SNR range should share the same set of SNR factors. SNR Subspace SNR Factor 2 Group1 Group2 Group3 SNR Factor 1 SNR Factor 3

Motivation 6 6 dB Method of modeling SNR information clean 15 dB SNR Subspace I-vector Space i-vector

7 Distribution of SNR in SRE12 Each SNR region is handled by a specific set of SNR factors

8 Contents 1.Background 2.Motivation of Work 3.SNR-invariant PLDA modeling for Robust Speaker Verification 4.Experiments on SRE12 5.Conclusions

9 SNR-invariant PLDA PLDA: By adding an SNR factor to the conventional PLDA, we have SNR-invariant PLDA : where U denotes the SNR subspace, is an SNR factor, and is the speaker (identity) factor for speaker i. Note that it is not the same as PLDA with channel subspace: i: Speaker index j: Session index k: SNR index

10 SNR-invariant PLDA We separate I-vectors into different groups according to the SNR of their utterances EM Algorithm

11 Compared with Conventional PLDA Conventional PLDA SNR-Invariant PLDA

12 PLDA vs SNR-invariant PLDA Generative Model

13 PLDA vs SNR-invariant PLDA E-Step

14 PLDA versus SNR-invariant PLDA M-Step

SNR-invariant PLDA Score 15 Likelihood Ratio Scores

16 Contents 1.Motivation of Work 2.Conventional PLDA 3.Mixture of PLDA for Noise Robust Speaker Verification 4.Experiments on SRE12 5.Conclusions

17 Data and Features Evaluation dataset: Common evaluation condition 1 and 4 of NIST SRE 2012 core set. Parameterization: 19 MFCCs together with energy plus their 1 st and 2 nd derivatives  60-Dim UBM: gender-dependent, 1024 mixtures Total Variability Matrix: gender-dependent, 500 total factors I-Vector Preprocessing:  Whitening by WCCN then length normalization  Followed by NFA (500-dim  200-dim)

18 Finding SNR Groups Training Utterances

SNR Distributions SNR Distribution of training and test utterances in CC4 19 Test Utterances Training Utterances

Performance on SRE12 MethodParametersMaleFemale KQEER(%)minDCFEER(%)minDCF PLDA mPLDA SNR- Invariant PLDA No. of SNR Groups No. of SNR factors (dim of ) 20 CC1 Mixture of PLDA (Mak, Interspeech14)

Performance on SRE12 MethodParametersMaleFemale KQEER(%)minDCFEER(%)minDCF PLDA mPLDA SNR- Invariant PLDA No. of SNR Groups 21 No. of SNR factors (dim of ) CC4

Performance on SRE12 CC4, Female Conventional PLDA SNR-Invariant PLDA 22

Conclusions We show that while I-vectors of different SNR fall on different regions of the I-vector space, they vary within a single cluster in an SNR-subspace. Therefore, it is possible to model the SNR variability by adding an SNR loading matrix and SNR factors to the conventional PLDA model. 23