Text independent speaker identification in multilingual environments I. Luengo, E. Navas, I. Sainz, I. Saratxaga, J. Sanchez, I. Odriozola and I. Hernaez.

Slides:



Advertisements
Similar presentations
PHONE MODELING AND COMBINING DISCRIMINATIVE TRAINING FOR MANDARIN-ENGLISH BILINGUAL SPEECH RECOGNITION Yanmin Qian, Jia Liu ICASSP2010 Pei-Ning Chen CSIE.
Advertisements

Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08.
Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.
An Energy Search Approach to Variable Frame Rate Front-End Processing for Robust ASR Julien Epps and Eric H. C. Choi National ICT Australia Presenter:
Paper presentation for CSI5388 PENGCHENG XI Mar. 23, 2005
Advances in WP1 Turin Meeting – 9-10 March
Assuming normally distributed data! Naïve Bayes Classifier.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Speaker Clustering using MDL Principles Kofi Boakye Stat212A Project December 3, 2003.
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.
Language and Speaker Identification using Gaussian Mixture Model Prepare by Jacky Chau The Chinese University of Hong Kong 18th September, 2002.
Advances in WP1 and WP2 Paris Meeting – 11 febr
The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System.
SNR-Dependent Mixture of PLDA for Noise Robust Speaker Verification
1 New Technique for Improving Speech Intelligibility for the Hearing Impaired Miriam Furst-Yust School of Electrical Engineering Tel Aviv University.
Why is ASR Hard? Natural speech is continuous
Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos
Advisor: Prof. Tony Jebara
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
Schizophrenia and Depression – Evidence in Speech Prosody Student: Yonatan Vaizman Advisor: Prof. Daphna Weinshall Joint work with Roie Kliper and Dr.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Towards an Intelligent Multilingual Keyboard System Tanapong Potipiti, Virach Sornlertlamvanich, Kanokwut Thanadkran Information Research and Development.
1 Improved Speaker Adaptation Using Speaker Dependent Feature Projections Spyros Matsoukas and Richard Schwartz Sep. 5, 2003 Martigny, Switzerland.
LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH RECOGNITION Weizhong Zhu and Douglas O’Shaughnessy INRS-EMT, University of Quebec Montreal, Quebec,
Multimodal Information Analysis for Emotion Recognition
1 Webcam Mouse Using Face and Eye Tracking in Various Illumination Environments Yuan-Pin Lin et al. Proceedings of the 2005 IEEE Y.S. Lee.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky.
A Comparative Study of Kernel Methods for Classification Applications Yan Liu Oct 21, 2003.
A methodology for the creation of a forensic speaker recognition database to handle mismatched conditions Anil Alexander and Andrzej Drygajlo Swiss Federal.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project DKE Maastricht University.
Presenter: Jinhua Du ( 杜金华 ) Xi’an University of Technology 西安理工大学 NLP&CC, Chongqing, Nov , 2013 Discriminative Latent Variable Based Classifier.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga Maurício O. Tsugawa ©2002,
Performance Comparison of Speaker and Emotion Recognition
A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-
ICASSP 2006 Robustness Techniques Survey ShihHsiang 2006.
Scientific Method Notes Science. Vocabulary Scientific method – A systematic approach to problem solving. Hypothesis – a proposed solution to a scientific.
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
Relevance Language Modeling For Speech Recognition Kuan-Yu Chen and Berlin Chen National Taiwan Normal University, Taipei, Taiwan ICASSP /1/17.
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
SRINIVAS DESAI, B. YEGNANARAYANA, KISHORE PRAHALLAD A Framework for Cross-Lingual Voice Conversion using Artificial Neural Networks 1 International Institute.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
An i-Vector PLDA based Gender Identification Approach for Severely Distorted and Multilingual DARPA RATS Data Shivesh Ranjan, Gang Liu and John H. L. Hansen.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
1 Voicing Features Horacio Franco, Martin Graciarena Andreas Stolcke, Dimitra Vergyri, Jing Zheng STAR Lab. SRI International.
Subjective evaluation of an emotional speech database for Basque Aholab Signal Processing Laboratory – University of the Basque Country Authors: I. Sainz,
Yow-Bang Wang, Lin-Shan Lee INTERSPEECH 2010 Speaker: Hsiao-Tsung Hung.
Research Methodology Proposal Prepared by: Norhasmizawati Ibrahim (813750)
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
1 LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0 Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff SSLI Lab Department of Electrical Engineering.
Using Speech Recognition to Predict VoIP Quality
Research on Machine Learning and Deep Learning
Reza Yazdani Albert Segura José-María Arnau Antonio González
Online Multiscale Dynamic Topic Models
Sfax University, Tunisia
Decision Making Based on Cohort Scores for
A maximum likelihood estimation and training on the fly approach
Speaker Identification:
SNR-Invariant PLDA Modeling for Robust Speaker Verification
3. Adversarial Teacher-Student Learning (AT/S)
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Analyzing F0 and vowel formants of Persian based on long-term features
Presentation transcript:

Text independent speaker identification in multilingual environments I. Luengo, E. Navas, I. Sainz, I. Saratxaga, J. Sanchez, I. Odriozola and I. Hernaez

Contents Introduction  SR in language mismatched conditions  Existent solutions  Proposed solution Working database Variability measures Experimental results Conclusions

Speaker Recognition System Feature Extr. Train M Feature Extr. ScoreDecision TRAIN TEST Language mismatch? Accuracy decreases

Existent solutions Multi-language training  One model trained with various languages (per speaker)  Model learns characteristics of different languages Multi-model training  One model for each language (per speaker)  Language detector

Existent solutions Drawbacks  Possible languages must be known in advance for each speaker  Not generalizable for languages not seen during training  More recording sessions needed for training  + Time  + Money Desired solution: Language independent  Suitable for languages not seen during training  Capable of single-language training

Proposed solution Language-independent features NNormalization? NNew features? Short-term intonation and energy values High speaker discrimination capability Global distribution may change little with language Combinable with MFCC OOnly in voiced frames (intonation) HHigh session variability MMVN for inter-session normalization

Database Bilingual Spanish-Basque speech database  22 speakers (11 Male, 11 Female)  4 sessions (inter-session variability)  7 numeric sequences (8 digits) per session and language

Variability measures Adding new features ALWAYS increases separability/variability + Speaker separability  + discrimination  + Language variability  + model/test mismatch  + Session variability  + model/test mismatch Key issue: Does speaker separability increase more than language/session variability?

Variability measures Kullback-Leibler divergence for variability estimation Interesting measures:  Good if new features increase these ratios Inter-speaker variability Inter-language variability Inter-speaker variability Inter-session variability

Variability measures MFCCMFCC+PGain Lang % Spk S % B % Ses S % B % Spk/Lang S % B % Spk/Ses S % B %

Experimental results X-Y  Training in X, testing in Y S-SB-BS-BB-SSB-SSB-B MFCC (ref) MFCC (V) MFCC+P (V) Gain (V)-0.5% 13.4%9.0%-0.5%-1.3%

Conclusions Short-term intonation and energy values increase language robustness  Little accuracy drop on language-matched conditions Very useful if test language is unpredictable Variability measures predict results reasonably  Allows easy selection of features prior to experiments

Text independent speaker identification in multilingual environments I. Luengo, E. Navas, I. Sainz, I. Saratxaga, J. Sanchez, I. Odriozola and I. Hernaez