Download presentation
Presentation is loading. Please wait.
Published byTaylor Elms Modified over 9 years ago
1
Results obtained in speaker recognition using Gaussian Mixture Models Marieta Gâta*, Gavril Toderean** *North University of Baia Mare **Technical University of Cluj Napoca
2
1. Introduction a speaker identification system based on Gaussian Mixture Models (GMM) - good performance for text- independent speech and short test utterances speaker recognition technique used - based on GMM approach consists in three phases: parameterization model training classification compare a model of a speech (unknown speaker) with models of speakers (our database) in training process - use the EM (Expectation - Maximization) algorithm for GMM study influences of different parameters in GMM's system performances : number of mixture components amount of training data (length of the wav file in seconds) numbers of iterations probability density function consisting of maxim of 12 mixtures for M speakers: to find the speaker model (with maximum posterior probability for the input vector sequence )
3
2. Speech Database systems - evaluated with Romanian speech database number of speakers = 200 (123 male and 77 female), different classes of age (student age 18-22) each speaker - 4 sentences (2 for testing & 2 for training) => training data used for training and testing - different speakers - recorded in 2/3 sessions (time among sessions < 1 month) speech - clean (laboratory background), recorded with microphone and sampled at 22 kHz, 16 bit and mono length of the training & testing sentences - from 4 to 10 seconds training sentences are: “Un numar de telefon este format din cifrele zero unu doi trei patru cinci sase sapte opt noua zece” “Principalele operatii matematice sunt adunarea scaderea inmultirea si impartirea” testing sentences are: “Numarul meu de telefon este patru zero doi sase doi unu doi trei patru cinci” “Automobilul meu atinge o viteza de o suta optzeci de km pe ora” feature vectors used - with 12th order MFCC-Mel Frequency Cepstral Coefficients (obtained from 20 mel-wrapping filter banks) first experiment - relation: number of mixture components and recognition performance second experiment - relation: different amount of training data (length of wav files), number of mixture components and recognition performance third and fourth experiments - relation: number of iterations, number of mixture components and recognition performance
4
3. Relation between number of mixture components and recognition performance GMM with a full covariance matrix - most complex models use a simplified form with each component consisting of diagonal of covariance matrix, mean and a weight models - tested => results in Figure 1 (in training process used 10 seconds of speech) better results with larger models (with more components) Figure 1. GMM with diagonal covariance matrix
5
4. Relation between different amount of training data, number of mixture components and recognition performance A GMM is tested on model sizes extracted from wav file of 4, 6 and 10 seconds recognition results increase with the number of mixture components and with the amount of training data results in Figure 2, recognition scores in Table 1 (best recognition results: M=12) growing length of training data generate best recognition results - right of the figure (higher number of components in the model) recognition error: right part of the figure - more reduced - then in the left part small amount of training data (4 and 6 seconds) - not ideal for GMM method best results were achieved with 12 components of GMM and the size of the wav files of 10 seconds Figure 2. Performance curves obtained from GMM models that were trained with different amount of data Table 1. GMM identification performance for different amounts of training data and model orders
6
5. Relations between number of iterations, number of mixture components and recognition performance study the influence EM iterations for improving recognition score - improved 10 iterations, recommended 50 iterations results in Figure 3 and Figure 4 Figure 3. Influence of EM iterations on recognition performance obtained from GMM with diagonal covariance matrix for models with 4, 6, 8, 10, 12 number of mixture components Figure 4. Influence of EM iterations on recognition performance obtained from GMM with diagonal covariance matrix for models with 4, 8, 12 number of mixture components
7
6. Conclusions performance of method - good maximizing the use of speaker data (maximizing size of the model) => improve the speaker recognition size of the model - too large (small amount of training data) => reduce performance of the recognition system best performance obtained with -12 mixture components of GMM -50 iterations of the process -size of the wav files of 10 seconds
8
References 1. Fredouille, C, Pouchoulin, G, Bonastre, J-F, Azzarello, M, Ghio, A G, “Application of Automatic Speaker Recognition techniques to pathological voice assessment”, 9 th European Conference on Speech Communication and Technology, (Interspeech) Lisboa, September 2005 2. Jin, Q, Waibel, A, “Application of LDA to Speaker Recognition”, Proceedings of International Conference on Spoken Language Processing ( ICSLP-2000), Beijing, PRChina, October 2000 3. Morris, A, Wu, D, Koreman, J, “GMM based clustering and speaker separability in the Timit speech database”, IEICE Transactions Fundamentals, Communications, Electronics, Informatics & Systems, Vol E85, 2005 4. Reynolds, DA, A Gaussian Mixture Modeling Approach to Text-Independent Speaker Identification, PhD Thesis, Georgia Institute of Technology, September 1992
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.