Download presentation
Presentation is loading. Please wait.
Published byJeffrey Chapman Modified over 8 years ago
1
Text independent speaker identification in multilingual environments I. Luengo, E. Navas, I. Sainz, I. Saratxaga, J. Sanchez, I. Odriozola and I. Hernaez
2
Contents Introduction SR in language mismatched conditions Existent solutions Proposed solution Working database Variability measures Experimental results Conclusions
3
Speaker Recognition System Feature Extr. Train M Feature Extr. ScoreDecision TRAIN TEST Language mismatch? Accuracy decreases
4
Existent solutions Multi-language training One model trained with various languages (per speaker) Model learns characteristics of different languages Multi-model training One model for each language (per speaker) Language detector
5
Existent solutions Drawbacks Possible languages must be known in advance for each speaker Not generalizable for languages not seen during training More recording sessions needed for training + Time + Money Desired solution: Language independent Suitable for languages not seen during training Capable of single-language training
6
Proposed solution Language-independent features NNormalization? NNew features? Short-term intonation and energy values High speaker discrimination capability Global distribution may change little with language Combinable with MFCC OOnly in voiced frames (intonation) HHigh session variability MMVN for inter-session normalization
7
Database Bilingual Spanish-Basque speech database 22 speakers (11 Male, 11 Female) 4 sessions (inter-session variability) 7 numeric sequences (8 digits) per session and language
8
Variability measures Adding new features ALWAYS increases separability/variability + Speaker separability + discrimination + Language variability + model/test mismatch + Session variability + model/test mismatch Key issue: Does speaker separability increase more than language/session variability?
9
Variability measures Kullback-Leibler divergence for variability estimation Interesting measures: Good if new features increase these ratios Inter-speaker variability Inter-language variability Inter-speaker variability Inter-session variability
10
Variability measures MFCCMFCC+PGain Lang-4.094.6112% Spk S6.348.2530% B6.828.7729% Ses S3.624.8133% B3.524.6432% Spk/Lang S1.551.7915% B1.671.9014% Spk/Ses S1.751.72-2% B1.941.89-3%
11
Experimental results X-Y Training in X, testing in Y S-SB-BS-BB-SSB-SSB-B MFCC (ref)98.397.363.667.396.895.6 MFCC (V)97.696.862.667.096.695.6 MFCC+P (V)97.196.371.073.096.194.4 Gain (V)-0.5% 13.4%9.0%-0.5%-1.3%
12
Conclusions Short-term intonation and energy values increase language robustness Little accuracy drop on language-matched conditions Very useful if test language is unpredictable Variability measures predict results reasonably Allows easy selection of features prior to experiments
13
Text independent speaker identification in multilingual environments I. Luengo, E. Navas, I. Sainz, I. Saratxaga, J. Sanchez, I. Odriozola and I. Hernaez
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.