3. Applications to Speaker Verification 11/19/2018
Outline of the presentation 3. Applications to Speaker Verifications 3.1 Feature extraction 3.2 Speaker models 3.3 Scoring normalization 3.4 Video demo 11/19/2018
Pre-processing and Feature Extraction A/D Converter cepstrum and delta cepstrum coefficients LPC Analysis Hamming Windowing 11/19/2018
Pre-processing and Feature Extraction Spectral Envelop Reconstructed from different feature parameters FFT-based signal Spectrum LP Spectrum Spectrum derived from LP-Cepstrum Cepstral Processing Spectrum Amplitude (dB) Hz 11/19/2018
Covariance analysis or EM Enrollment RBF Network EBF Network Feature vectors Feature vectors K-means K-means Covariance analysis or EM K-nearest neighbor Function centers Linear regression Linear regression Covariance matrices Function widths Output weights 11/19/2018 W
Background speakers´centers Input (Feature vectors) Enrollment 0(Bias) Output weights Background speakers´centers Speaker centers x1 x2 xD Input (Feature vectors) 11/19/2018
Verification + y(x) + - Averaging Averaging Softmax x1 x2 xD ^ 11/19/2018
Verification Distributions of the average network outputs RBF EBF 11/19/2018
Error rates against decision threshold Verification Error rates against decision threshold 11/19/2018
Verification Results (TIMIT) Number of centers per network 11/19/2018
Verification Results Decision Boundaries EBF (diagonal cov. Matrices) EBF (full cov. Matrices) 11/19/2018
Conclusion EBF networks with full covariance matrices trained with the EM algorithm outperform the ones whose basis function parameters are estimated by the k-means algorithm and sample covariance. RBF networks are found to be the poorest performer in terms of verification accuracy. 11/19/2018
Conclusion EBF networks with full covariance matrices achieve the lowest error rates when networks with the same number of free parameters are compared. 11/19/2018
Scoring Normalization for Speaker 4. Bonus Materials: Scoring Normalization for Speaker Verification 11/19/2018
Purpose of Scoring Normalization Speaker model of claimed ID Sc Speech with claimed speaker ID X Feature extraction - Imposter Models Normalization Term 11/19/2018
Purpose of Scoring Normalization > Threshold Accept the claimant If log L(X) Threshold Reject the claimant Prob. x1 (Accept) x2 (Reject) x 11/19/2018
EBFN-based normalization Speaker centers Anti-speaker centers Speaker models: Elliptical basis function networks (EBFN) 11/19/2018
References: [1] Mak, M.W. and Kung, S.Y. (2000). "Estimation of elliptical basis function parameters by the EM algorithms with application to speaker verification," IEEE Trans. on Neural Networks, Vol. 11, No. 4, pp. 961-969. [2] Yiu, K.K., Mak, M.W. and Li, C.K. (1999), “Gaussian mixture models and probabilistic decision-based neural networks for pattern classification: A comparative study," Neural Computing and Applications, 8, 235-245. [3] Zhang, W.D. Mak, M.W. and He, M.X. (2000). "A two-stage scoring method combining world and cohort models for speaker verification," Proc. ICASSP, Vol. 2, pp. 1193-1196, 2000. [4] Lin, S.H., Kung, S.Y. and Lin, L.J. (1997). “Face recognition/detection by probabilistic decision-based neural network, IEEE Trans. on Neural Networks, 8 (1), pp. 114-132. [5] Mak, M.W. et al. (1994), “Speaker Identification using Multi Layer Perceptrons and Radial Basis Functions Networks,” Neurocomputing, 6 (1), 99-118. 11/19/2018