Results obtained in speaker recognition using Gaussian Mixture Models Marieta Gâta*, Gavril Toderean** *North University of Baia Mare **Technical University.

Slides:



Advertisements
Similar presentations
Discriminative Training in Speech Processing Filipp Korkmazsky LORIA.
Advertisements

Speech Recognition with Hidden Markov Models Winter 2011
Víctor Ponce Miguel Reyes Xavier Baró Mario Gorga Sergio Escalera Two-level GMM Clustering of Human Poses for Automatic Human Behavior Analysis Departament.
A Text-Independent Speaker Recognition System
VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.
AUTOMATIC SPEECH CLASSIFICATION TO FIVE EMOTIONAL STATES BASED ON GENDER INFORMATION ABSTRACT We report on the statistics of global prosodic features of.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Page 0 of 8 Time Series Classification – phoneme recognition in reconstructed phase space Sanjay Patil Intelligent Electronics Systems Human and Systems.
Language and Speaker Identification using Gaussian Mixture Model Prepare by Jacky Chau The Chinese University of Hong Kong 18th September, 2002.
HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.
SNR-Dependent Mixture of PLDA for Noise Robust Speaker Verification
Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos
Advisor: Prof. Tony Jebara
9.0 Speaker Variabilities: Adaption and Recognition References: of Huang 2. “ Maximum A Posteriori Estimation for Multivariate Gaussian Mixture.
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING MARCH 2010 Lan-Ying Yeh
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Kinect Player Gender Recognition from Speech Analysis
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires too much data and is computationally complex. Solution: Create.
Advanced Signal Processing 2, SE Professor Horst Cerjak, Andrea Sereinig Graz, Basics of Hidden Markov Models Basics of HMM-based.
Overview of NIT HMM-based speech synthesis system for Blizzard Challenge 2011 Kei Hashimoto, Shinji Takaki, Keiichiro Oura, and Keiichi Tokuda Nagoya.
Isolated-Word Speech Recognition Using Hidden Markov Models
1 Robust HMM classification schemes for speaker recognition using integral decode Marie Roch Florida International University.
Macquarie RT05s Speaker Diarisation System Steve Cassidy Centre for Language Technology Macquarie University Sydney.
INTRODUCTION  Sibilant speech is aperiodic.  the fricatives /s/, / ʃ /, /z/ and / Ʒ / and the affricatives /t ʃ / and /d Ʒ /  we present a sibilant.
Utterance Verification for Spontaneous Mandarin Speech Keyword Spotting Liu Xin, BinXi Wang Presenter: Kai-Wun Shih No.306, P.O. Box 1001,ZhengZhou,450002,
7-Speech Recognition Speech Recognition Concepts
VBS Documentation and Implementation The full standard initiative is located at Quick description Standard manual.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Regression Approaches to Voice Quality Control Based on One-to-Many Eigenvoice Conversion Kumi Ohta, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, and.
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
An Evaluation of Many-to-One Voice Conversion Algorithms with Pre-Stored Speaker Data Sets Daisuke Tani, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari.
Speech Parameter Generation From HMM Using Dynamic Features Keiichi Tokuda, Takao Kobayashi, Satoshi Imai ICASSP 1995 Reporter: Huang-Wei Chen.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Speaker Verification Speaker verification uses voice as a biometric to determine the authenticity of a user. Speaker verification systems consist of two.
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
Nick Wang, 25 Oct Speaker identification and verification using EigenVoices O. Thyes, R. Kuhn, P. Nguyen, and J.-C. Junqua in ICSLP2000 Presented.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.
Performance Comparison of Speaker and Emotion Recognition
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
SNR-Invariant PLDA Modeling for Robust Speaker Verification Na Li and Man-Wai Mak Department of Electronic and Information Engineering The Hong Kong Polytechnic.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
A Tutorial on Speaker Verification First A. Author, Second B. Author, and Third C. Author.
A NONPARAMETRIC BAYESIAN APPROACH FOR
ARTIFICIAL NEURAL NETWORKS
Statistical Models for Automatic Speech Recognition
3. Applications to Speaker Verification
ECE539 final project Instructor: Yu Hen Hu Fall 2005
Statistical Models for Automatic Speech Recognition
Sfax University, Tunisia
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
EE513 Audio Signals and Systems
AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION
A maximum likelihood estimation and training on the fly approach
SNR-Invariant PLDA Modeling for Robust Speaker Verification
Presentation transcript:

Results obtained in speaker recognition using Gaussian Mixture Models Marieta Gâta*, Gavril Toderean** *North University of Baia Mare **Technical University of Cluj Napoca

1. Introduction a speaker identification system based on Gaussian Mixture Models (GMM) - good performance for text- independent speech and short test utterances speaker recognition technique used - based on GMM approach consists in three phases: parameterization model training classification compare a model of a speech (unknown speaker) with models of speakers (our database) in training process - use the EM (Expectation - Maximization) algorithm for GMM study influences of different parameters in GMM's system performances : number of mixture components amount of training data (length of the wav file in seconds) numbers of iterations probability density function consisting of maxim of 12 mixtures for M speakers: to find the speaker model (with maximum posterior probability for the input vector sequence )

2. Speech Database systems - evaluated with Romanian speech database number of speakers = 200 (123 male and 77 female), different classes of age (student age 18-22) each speaker - 4 sentences (2 for testing & 2 for training) => training data used for training and testing - different speakers - recorded in 2/3 sessions (time among sessions < 1 month) speech - clean (laboratory background), recorded with microphone and sampled at 22 kHz, 16 bit and mono length of the training & testing sentences - from 4 to 10 seconds training sentences are: “Un numar de telefon este format din cifrele zero unu doi trei patru cinci sase sapte opt noua zece” “Principalele operatii matematice sunt adunarea scaderea inmultirea si impartirea” testing sentences are: “Numarul meu de telefon este patru zero doi sase doi unu doi trei patru cinci” “Automobilul meu atinge o viteza de o suta optzeci de km pe ora” feature vectors used - with 12th order MFCC-Mel Frequency Cepstral Coefficients (obtained from 20 mel-wrapping filter banks) first experiment - relation: number of mixture components and recognition performance second experiment - relation: different amount of training data (length of wav files), number of mixture components and recognition performance third and fourth experiments - relation: number of iterations, number of mixture components and recognition performance

3. Relation between number of mixture components and recognition performance GMM with a full covariance matrix - most complex models use a simplified form with each component consisting of diagonal of covariance matrix, mean and a weight models - tested => results in Figure 1 (in training process used 10 seconds of speech) better results with larger models (with more components) Figure 1. GMM with diagonal covariance matrix

4. Relation between different amount of training data, number of mixture components and recognition performance A GMM is tested on model sizes extracted from wav file of 4, 6 and 10 seconds recognition results increase with the number of mixture components and with the amount of training data results in Figure 2, recognition scores in Table 1 (best recognition results: M=12) growing length of training data generate best recognition results - right of the figure (higher number of components in the model) recognition error: right part of the figure - more reduced - then in the left part small amount of training data (4 and 6 seconds) - not ideal for GMM method best results were achieved with 12 components of GMM and the size of the wav files of 10 seconds Figure 2. Performance curves obtained from GMM models that were trained with different amount of data Table 1. GMM identification performance for different amounts of training data and model orders

5. Relations between number of iterations, number of mixture components and recognition performance study the influence EM iterations for improving recognition score - improved 10 iterations, recommended 50 iterations results in Figure 3 and Figure 4 Figure 3. Influence of EM iterations on recognition performance obtained from GMM with diagonal covariance matrix for models with 4, 6, 8, 10, 12 number of mixture components Figure 4. Influence of EM iterations on recognition performance obtained from GMM with diagonal covariance matrix for models with 4, 8, 12 number of mixture components

6. Conclusions performance of method - good maximizing the use of speaker data (maximizing size of the model) => improve the speaker recognition size of the model - too large (small amount of training data) => reduce performance of the recognition system best performance obtained with -12 mixture components of GMM -50 iterations of the process -size of the wav files of 10 seconds

References 1. Fredouille, C, Pouchoulin, G, Bonastre, J-F, Azzarello, M, Ghio, A G, “Application of Automatic Speaker Recognition techniques to pathological voice assessment”, 9 th European Conference on Speech Communication and Technology, (Interspeech) Lisboa, September Jin, Q, Waibel, A, “Application of LDA to Speaker Recognition”, Proceedings of International Conference on Spoken Language Processing ( ICSLP-2000), Beijing, PRChina, October Morris, A, Wu, D, Koreman, J, “GMM based clustering and speaker separability in the Timit speech database”, IEICE Transactions Fundamentals, Communications, Electronics, Informatics & Systems, Vol E85, Reynolds, DA, A Gaussian Mixture Modeling Approach to Text-Independent Speaker Identification, PhD Thesis, Georgia Institute of Technology, September 1992