Download presentation
Presentation is loading. Please wait.
1
3. Applications to Speaker Verification
11/19/2018
2
Outline of the presentation
3. Applications to Speaker Verifications 3.1 Feature extraction 3.2 Speaker models 3.3 Scoring normalization 3.4 Video demo 11/19/2018
3
Pre-processing and Feature Extraction
A/D Converter cepstrum and delta cepstrum coefficients LPC Analysis Hamming Windowing 11/19/2018
4
Pre-processing and Feature Extraction
Spectral Envelop Reconstructed from different feature parameters FFT-based signal Spectrum LP Spectrum Spectrum derived from LP-Cepstrum Cepstral Processing Spectrum Amplitude (dB) Hz 11/19/2018
5
Covariance analysis or EM
Enrollment RBF Network EBF Network Feature vectors Feature vectors K-means K-means Covariance analysis or EM K-nearest neighbor Function centers Linear regression Linear regression Covariance matrices Function widths Output weights 11/19/2018 W
6
Background speakers´centers Input (Feature vectors)
Enrollment 0(Bias) Output weights Background speakers´centers Speaker centers x1 x2 xD Input (Feature vectors) 11/19/2018
7
Verification + y(x) + - Averaging Averaging Softmax x1 x2 xD ^
11/19/2018
8
Verification Distributions of the average network outputs RBF EBF
11/19/2018
9
Error rates against decision threshold
Verification Error rates against decision threshold 11/19/2018
10
Verification Results (TIMIT)
Number of centers per network 11/19/2018
11
Verification Results Decision Boundaries EBF (diagonal cov. Matrices)
EBF (full cov. Matrices) 11/19/2018
12
Conclusion EBF networks with full covariance matrices trained with the EM algorithm outperform the ones whose basis function parameters are estimated by the k-means algorithm and sample covariance. RBF networks are found to be the poorest performer in terms of verification accuracy. 11/19/2018
13
Conclusion EBF networks with full covariance matrices achieve the lowest error rates when networks with the same number of free parameters are compared. 11/19/2018
14
Scoring Normalization for Speaker
4. Bonus Materials: Scoring Normalization for Speaker Verification 11/19/2018
15
Purpose of Scoring Normalization Speaker model of claimed ID Sc
Speech with claimed speaker ID X Feature extraction - Imposter Models Normalization Term 11/19/2018
16
Purpose of Scoring Normalization
> Threshold Accept the claimant If log L(X) Threshold Reject the claimant Prob. x1 (Accept) x2 (Reject) x 11/19/2018
17
EBFN-based normalization
Speaker centers Anti-speaker centers Speaker models: Elliptical basis function networks (EBFN) 11/19/2018
18
References: [1] Mak, M.W. and Kung, S.Y. (2000). "Estimation of elliptical basis function parameters by the EM algorithms with application to speaker verification," IEEE Trans. on Neural Networks, Vol. 11, No. 4, pp [2] Yiu, K.K., Mak, M.W. and Li, C.K. (1999), “Gaussian mixture models and probabilistic decision-based neural networks for pattern classification: A comparative study," Neural Computing and Applications, 8, [3] Zhang, W.D. Mak, M.W. and He, M.X. (2000). "A two-stage scoring method combining world and cohort models for speaker verification," Proc. ICASSP, Vol. 2, pp , 2000. [4] Lin, S.H., Kung, S.Y. and Lin, L.J. (1997). “Face recognition/detection by probabilistic decision-based neural network, IEEE Trans. on Neural Networks, 8 (1), pp [5] Mak, M.W. et al. (1994), “Speaker Identification using Multi Layer Perceptrons and Radial Basis Functions Networks,” Neurocomputing, 6 (1), 11/19/2018
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.