Presentation is loading. Please wait.

Presentation is loading. Please wait.

SNR-Dependent Mixture of PLDA for Noise Robust Speaker Verification

Similar presentations


Presentation on theme: "SNR-Dependent Mixture of PLDA for Noise Robust Speaker Verification"— Presentation transcript:

1 SNR-Dependent Mixture of PLDA for Noise Robust Speaker Verification
Man-Wai Mak Interspeech 2014 Department of Electronic and Information Engineering The Hong Kong Polytechnic University, Hong Kong SAR, China

2 Contents Motivation of Work Conventional PLDA
Mixture of PLDA for Noise Robust Speaker Verification Experiments on SRE12 Conclusions 2

3 I-Vector/PLDA Scoring
Motivation Conventional i-vector/PLDA systems use a single PLDA model to handle all SNR conditions. I-Vector/PLDA Scoring PLDA Score Enrollment Utterances

4 Motivation We argue that a PLDA model should focus on a small range of SNR. PLDA Model 1 PLDA Score PLDA Model 2 PLDA Score PLDA Model 3 PLDA Score

5 Distribution of SNR in SRE12
Each SNR region is handled by a PLDA Model

6 Proposed Solution The full spectrum of SNRs is handled by a mixture of PLDA in which the posteriors of the indicator variables depend on the utterance’s SNR. PLDA Model 1 PLDA Score SNR Estimator PLDA Model 2 SNR Posterior Estimator PLDA Model 3

7 Key Features of Proposed Solution
Verification scores depend not only on the same- speaker and different-speaker likelihoods but also on the posterior probabilities of SNR.

8 Contents Motivation of Work Conventional PLDA
Mixture of PLDA for Noise Robust Speaker Verification Experiments on SRE12 Conclusions

9 Probabilistic LDA (PLDA)
In PLDA, the i-vectors x are modeled by a factor analyzer of the form: Speaker factor Residual noise with covariance Σ i-vector extracted from the j-th session of the i-th speaker Global mean of all i-vectors Speaker factor loading matrix Density of x is

10 Probabilistic LDA (PLDA)
The PLDA parameters ω={m, V, Σ} are estimated by maximizing

11 Contents Motivation of Work Conventional PLDA
Mixture of PLDA for Noise Robust Speaker Verification Experiments on SRE12 Conclusions

12 Mixture of PLDA Model Parameters of mPLDA: For modeling SNR of utts.
For modeling SNR-dependent i-vectors 2

13 Generative Model for mPLDA
: SNR in dB where the posterior prob of SNR is Posterior of SNR

14 PLDA vs mPLDA Generative Model PLDA Mixture of PLDA

15 Likelihood-Ratio Scores of mPLDA
Same-speaker likelihood: SNR of target and test utterances i-vectors of target and test speakers

16 Likelihood-Ratio Scores of mPLDA
Different-speaker likelihood: Same-speaker likelihood Verification Score = Different-speaker likelihood 16

17 PLDA vs mPLDA PLDA: Mixture of PLDA: Auxiliary Function
Latent indicator variables: No. of mixtures Latent speaker factors: SNR of training utterances: Speaker indexes Session indexes

18 PLDA vs mPLDA E-Step PLDA Mixture of PLDA

19 PLDA versus mPLDA M-Step PLDA Mixture of PLDA

20 Contents Motivation of Work Conventional PLDA
Mixture of PLDA for Noise Robust Speaker Verification Experiments on SRE12 Conclusions

21 Experiments Evaluation dataset: Common evaluation condition 2 of NIST SRE 2012 core set. Parameterization: 19 MFCCs together with energy plus their 1st and 2nd derivatives  60-Dim UBM: gender-dependent, 1024 mixtures Total Variability Matrix: gender-dependent, 500 total factors I-Vector Preprocessing: Whitening by WCCN then length normalization Followed by LDA (500-dim  200-dim) and WCCN

22 Experiments In NIST 2012 SRE, training utterances from telephone channels are clean, but some of the test utterances are noisy. We used the FaNT tool to add babble noise to the clean training utterances Babble noise Utterances from microphone channels FaNT From telephone channels

23 Performance on SRE12 Train on tel+mic speech and test on noisy tel speech (CC4) Train on tel+mic speech and test on tel speech recorded in noisy environments (CC5) Use FaNT and a VAD to determine the SNR of test utts. See our ISCSLP14 paper

24 Performance on SRE12 Train on tel+mic speech and test on noisy tel speech (CC4) Use FaNT and a VAD to determine the SNR of test utts. Male Female PLDA PLDA mPLDA mPLDA

25 Conclusions Mixture of SNR-dependent PLDA is a flexible model that can handle noisy speech with a wide range of SNR The contribution of the mixtures are probabilistically combined based on the SNR of the test utterances and the target-speaker’s utterances Results show that the mixture PLDA performs better than conventional PLDA whenever the SNR of test utterances varies widely.

26 Hard-Decision Mixture of PLDA

27 Training of mPLDA Auxiliary function: where
No. of mixtures where Latent indicator variables: Latent speaker factors: SNR of training utterances: Speaker indexes Session indexes

28 xs and xt share the same z
PLDA Scoring xs and xt share the same z

29 Probabilistic LDA (PLDA)
PLDA example: 2-D data in 1-D subspace z Take a sample according to p(z) Source: S. Prince, “Computer vision: models, learning and inference”, 2012


Download ppt "SNR-Dependent Mixture of PLDA for Noise Robust Speaker Verification"

Similar presentations


Ads by Google