SNR-Dependent Mixture of PLDA for Noise Robust Speaker Verification

SNR-Dependent Mixture of PLDA for Noise Robust Speaker Verification
Man-Wai Mak Interspeech 2014 Department of Electronic and Information Engineering The Hong Kong Polytechnic University, Hong Kong SAR, China

Contents Motivation of Work Conventional PLDA
Mixture of PLDA for Noise Robust Speaker Verification Experiments on SRE12 Conclusions 2

I-Vector/PLDA Scoring
Motivation Conventional i-vector/PLDA systems use a single PLDA model to handle all SNR conditions. I-Vector/PLDA Scoring PLDA Score Enrollment Utterances

Motivation We argue that a PLDA model should focus on a small range of SNR. PLDA Model 1 PLDA Score PLDA Model 2 PLDA Score PLDA Model 3 PLDA Score

Distribution of SNR in SRE12
Each SNR region is handled by a PLDA Model

Proposed Solution The full spectrum of SNRs is handled by a mixture of PLDA in which the posteriors of the indicator variables depend on the utterance’s SNR. PLDA Model 1 PLDA Score SNR Estimator PLDA Model 2 SNR Posterior Estimator PLDA Model 3

Key Features of Proposed Solution
Verification scores depend not only on the same- speaker and different-speaker likelihoods but also on the posterior probabilities of SNR.

Mixture of PLDA for Noise Robust Speaker Verification Experiments on SRE12 Conclusions

Probabilistic LDA (PLDA)
In PLDA, the i-vectors x are modeled by a factor analyzer of the form: Speaker factor Residual noise with covariance Σ i-vector extracted from the j-th session of the i-th speaker Global mean of all i-vectors Speaker factor loading matrix Density of x is

The PLDA parameters ω={m, V, Σ} are estimated by maximizing

Mixture of PLDA Model Parameters of mPLDA: For modeling SNR of utts.
For modeling SNR-dependent i-vectors 2

Generative Model for mPLDA
: SNR in dB where the posterior prob of SNR is Posterior of SNR

PLDA vs mPLDA Generative Model PLDA Mixture of PLDA

Likelihood-Ratio Scores of mPLDA
Same-speaker likelihood: SNR of target and test utterances i-vectors of target and test speakers

Likelihood-Ratio Scores of mPLDA
Different-speaker likelihood: Same-speaker likelihood Verification Score = Different-speaker likelihood 16

PLDA vs mPLDA PLDA: Mixture of PLDA: Auxiliary Function
Latent indicator variables: No. of mixtures Latent speaker factors: SNR of training utterances: Speaker indexes Session indexes

PLDA vs mPLDA E-Step PLDA Mixture of PLDA

PLDA versus mPLDA M-Step PLDA Mixture of PLDA

Experiments Evaluation dataset: Common evaluation condition 2 of NIST SRE 2012 core set. Parameterization: 19 MFCCs together with energy plus their 1st and 2nd derivatives  60-Dim UBM: gender-dependent, 1024 mixtures Total Variability Matrix: gender-dependent, 500 total factors I-Vector Preprocessing: Whitening by WCCN then length normalization Followed by LDA (500-dim  200-dim) and WCCN

Experiments In NIST 2012 SRE, training utterances from telephone channels are clean, but some of the test utterances are noisy. We used the FaNT tool to add babble noise to the clean training utterances Babble noise Utterances from microphone channels FaNT From telephone channels

Performance on SRE12 Train on tel+mic speech and test on noisy tel speech (CC4) Train on tel+mic speech and test on tel speech recorded in noisy environments (CC5) Use FaNT and a VAD to determine the SNR of test utts. See our ISCSLP14 paper

Performance on SRE12 Train on tel+mic speech and test on noisy tel speech (CC4) Use FaNT and a VAD to determine the SNR of test utts. Male Female PLDA PLDA mPLDA mPLDA

Conclusions Mixture of SNR-dependent PLDA is a flexible model that can handle noisy speech with a wide range of SNR The contribution of the mixtures are probabilistically combined based on the SNR of the test utterances and the target-speaker’s utterances Results show that the mixture PLDA performs better than conventional PLDA whenever the SNR of test utterances varies widely.

Hard-Decision Mixture of PLDA

Training of mPLDA Auxiliary function: where
No. of mixtures where Latent indicator variables: Latent speaker factors: SNR of training utterances: Speaker indexes Session indexes

xs and xt share the same z
PLDA Scoring xs and xt share the same z

PLDA example: 2-D data in 1-D subspace z Take a sample according to p(z) Source: S. Prince, “Computer vision: models, learning and inference”, 2012

SNR-Dependent Mixture of PLDA for Noise Robust Speaker Verification

Similar presentations

Presentation on theme: "SNR-Dependent Mixture of PLDA for Noise Robust Speaker Verification"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

SNR-Dependent Mixture of PLDA for Noise Robust Speaker Verification

Similar presentations

Presentation on theme: "SNR-Dependent Mixture of PLDA for Noise Robust Speaker Verification"— Presentation transcript:

Similar presentations

About project

Feedback