Download presentation
Presentation is loading. Please wait.
Published byFerdinand Wade Modified over 9 years ago
1
SNR-Dependent Mixture of PLDA for Noise Robust Speaker Verification
Man-Wai Mak Interspeech 2014 Department of Electronic and Information Engineering The Hong Kong Polytechnic University, Hong Kong SAR, China
2
Contents Motivation of Work Conventional PLDA
Mixture of PLDA for Noise Robust Speaker Verification Experiments on SRE12 Conclusions 2
3
I-Vector/PLDA Scoring
Motivation Conventional i-vector/PLDA systems use a single PLDA model to handle all SNR conditions. I-Vector/PLDA Scoring PLDA Score Enrollment Utterances
4
Motivation We argue that a PLDA model should focus on a small range of SNR. PLDA Model 1 PLDA Score PLDA Model 2 PLDA Score PLDA Model 3 PLDA Score
5
Distribution of SNR in SRE12
Each SNR region is handled by a PLDA Model
6
Proposed Solution The full spectrum of SNRs is handled by a mixture of PLDA in which the posteriors of the indicator variables depend on the utterance’s SNR. PLDA Model 1 PLDA Score SNR Estimator PLDA Model 2 SNR Posterior Estimator PLDA Model 3
7
Key Features of Proposed Solution
Verification scores depend not only on the same- speaker and different-speaker likelihoods but also on the posterior probabilities of SNR.
8
Contents Motivation of Work Conventional PLDA
Mixture of PLDA for Noise Robust Speaker Verification Experiments on SRE12 Conclusions
9
Probabilistic LDA (PLDA)
In PLDA, the i-vectors x are modeled by a factor analyzer of the form: Speaker factor Residual noise with covariance Σ i-vector extracted from the j-th session of the i-th speaker Global mean of all i-vectors Speaker factor loading matrix Density of x is
10
Probabilistic LDA (PLDA)
The PLDA parameters ω={m, V, Σ} are estimated by maximizing
11
Contents Motivation of Work Conventional PLDA
Mixture of PLDA for Noise Robust Speaker Verification Experiments on SRE12 Conclusions
12
Mixture of PLDA Model Parameters of mPLDA: For modeling SNR of utts.
For modeling SNR-dependent i-vectors 2
13
Generative Model for mPLDA
: SNR in dB where the posterior prob of SNR is Posterior of SNR
14
PLDA vs mPLDA Generative Model PLDA Mixture of PLDA
15
Likelihood-Ratio Scores of mPLDA
Same-speaker likelihood: SNR of target and test utterances i-vectors of target and test speakers
16
Likelihood-Ratio Scores of mPLDA
Different-speaker likelihood: Same-speaker likelihood Verification Score = Different-speaker likelihood 16
17
PLDA vs mPLDA PLDA: Mixture of PLDA: Auxiliary Function
Latent indicator variables: No. of mixtures Latent speaker factors: SNR of training utterances: Speaker indexes Session indexes
18
PLDA vs mPLDA E-Step PLDA Mixture of PLDA
19
PLDA versus mPLDA M-Step PLDA Mixture of PLDA
20
Contents Motivation of Work Conventional PLDA
Mixture of PLDA for Noise Robust Speaker Verification Experiments on SRE12 Conclusions
21
Experiments Evaluation dataset: Common evaluation condition 2 of NIST SRE 2012 core set. Parameterization: 19 MFCCs together with energy plus their 1st and 2nd derivatives 60-Dim UBM: gender-dependent, 1024 mixtures Total Variability Matrix: gender-dependent, 500 total factors I-Vector Preprocessing: Whitening by WCCN then length normalization Followed by LDA (500-dim 200-dim) and WCCN
22
Experiments In NIST 2012 SRE, training utterances from telephone channels are clean, but some of the test utterances are noisy. We used the FaNT tool to add babble noise to the clean training utterances Babble noise Utterances from microphone channels FaNT From telephone channels
23
Performance on SRE12 Train on tel+mic speech and test on noisy tel speech (CC4) Train on tel+mic speech and test on tel speech recorded in noisy environments (CC5) Use FaNT and a VAD to determine the SNR of test utts. See our ISCSLP14 paper
24
Performance on SRE12 Train on tel+mic speech and test on noisy tel speech (CC4) Use FaNT and a VAD to determine the SNR of test utts. Male Female PLDA PLDA mPLDA mPLDA
25
Conclusions Mixture of SNR-dependent PLDA is a flexible model that can handle noisy speech with a wide range of SNR The contribution of the mixtures are probabilistically combined based on the SNR of the test utterances and the target-speaker’s utterances Results show that the mixture PLDA performs better than conventional PLDA whenever the SNR of test utterances varies widely.
26
Hard-Decision Mixture of PLDA
27
Training of mPLDA Auxiliary function: where
No. of mixtures where Latent indicator variables: Latent speaker factors: SNR of training utterances: Speaker indexes Session indexes
28
xs and xt share the same z
PLDA Scoring xs and xt share the same z
29
Probabilistic LDA (PLDA)
PLDA example: 2-D data in 1-D subspace z Take a sample according to p(z) Source: S. Prince, “Computer vision: models, learning and inference”, 2012
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.