INCORPORATING MULTIPLE-HMM ACOUSTIC MODELING IN A MODULAR LARGE VOCABULARY SPEECH RECOGNITION SYSTEM IN TELEPHONE ENVIRONMENT A. Gallardo-Antolín, J. Ferreiros,

Slides:



Advertisements
Similar presentations
MAI Internship April-May MAI Internship 2002 Slide 2 of 14 What? The AST Project promotes development of speech technology for official languages.
Advertisements

Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.
Author :Panikos Heracleous, Tohru Shimizu AN EFFICIENT KEYWORD SPOTTING TECHNIQUE USING A COMPLEMENTARY LANGUAGE FOR FILLER MODELS TRAINING Reporter :
Advances in WP2 Torino Meeting – 9-10 March
ETRW Modelling Pronunciation variation for ASR ESCA Tutorial & Research Workshop Modelling pronunciation variation for ASR INTRODUCING MULTIPLE PRONUNCIATIONS.
Survey of INTERSPEECH 2013 Reporter: Yi-Ting Wang 2013/09/10.
Detection of Recognition Errors and Out of the Spelling Dictionary Names in a Spelled Name Recognizer for Spanish R. San-Segundo, J. Macías-Guarasa, J.
EMOTIONS NATURE EVALUATION BASED ON SEGMENTAL INFORMATION BASED ON PROSODIC INFORMATION AUTOMATIC CLASSIFICATION EXPERIMENTS RESYNTHESIS VOICE PERCEPTUAL.
1 Voice Command Generation for Teleoperated Robot Systems Authors : M. Ferre, J. Macias-Guarasa, R. Aracil, A. Barrientos Presented by M. Ferre. Universidad.
Acoustical and Lexical Based Confidence Measures for a Very Large Vocabulary Telephone Speech Hypothesis-Verification System Javier Macías-Guarasa, Javier.
Speaker Adaptation for Vowel Classification
CONTROLLING A HIFI WITH A CONTINUOUS SPEECH UNDERSTANDING SYSTEM ICSLP’ 98 CONTROLLING A HIFI WITH A CONTINUOUS SPEECH UNDERSTANDING SYSTEM J. Ferreiros,
Non-native Speech Languages have different pronunciation spaces
VESTEL database realistic telephone speech corpus:  PRNOK5TR: 5810 utterances in the training set  PERFDV: 2502 utterances in testing set 1 (vocabulary.
VARIABLE PRESELECTION LIST LENGTH ESTIMATION USING NEURAL NETWORKS IN A TELEPHONE SPEECH HYPOTHESIS-VERIFICATION SYSTEM J. Macías-Guarasa, J. Ferreiros,
1 Spontaneous-Speech Dialogue System In Limited Domains ( ) Development of an oral human-machine interface, by way of dialogue, for a semantically.
SNR-Dependent Mixture of PLDA for Noise Robust Speaker Verification
Why is ASR Hard? Natural speech is continuous
Automatic Continuous Speech Recognition Database speech text Scoring.
May 20, 2006SRIV2006, Toulouse, France1 Acoustic Modeling of Accented English Speech for Large-Vocabulary Speech Recognition ATR Spoken Language Communication.
Acoustic and Linguistic Characterization of Spontaneous Speech Masanobu Nakamura, Koji Iwano, and Sadaoki Furui Department of Computer Science Tokyo Institute.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
„Bandwidth Extension of Speech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd and 23rd June.
Utterance Verification for Spontaneous Mandarin Speech Keyword Spotting Liu Xin, BinXi Wang Presenter: Kai-Wun Shih No.306, P.O. Box 1001,ZhengZhou,450002,
7-Speech Recognition Speech Recognition Concepts
Diamantino Caseiro and Isabel Trancoso INESC/IST, 2000 Large Vocabulary Recognition Applied to Directory Assistance Services.
1 Phoneme and Sub-phoneme T- Normalization for Text-Dependent Speaker Recognition Doroteo T. Toledano 1, Cristina Esteve-Elizalde 1, Joaquin Gonzalez-Rodriguez.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
SPEECH CONTENT Spanish Expressive Voices: Corpus for Emotion Research in Spanish R. Barra-Chicote 1, J. M. Montero 1, J. Macias-Guarasa 2, S. Lufti 1,
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
Incorporating Dynamic Time Warping (DTW) in the SeqRec.m File Presented by: Clay McCreary, MSEE.
DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.
Presented by: Fang-Hui Chu Boosting HMM acoustic models in large vocabulary speech recognition Carsten Meyer, Hauke Schramm Philips Research Laboratories,
Modeling Speech using POMDPs In this work we apply a new model, POMPD, in place of the traditional HMM to acoustically model the speech signal. We use.
A Phonetic Search Approach to the 2006 NIST Spoken Term Detection Evaluation Roy Wallace, Robbie Vogt and Sridha Sridharan Speech and Audio Research Laboratory,
Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.
CMU Robust Vocabulary-Independent Speech Recognition System Hsiao-Wuen Hon and Kai-Fu Lee ICASSP 1991 Presenter: Fang-Hui CHU.
Summary  Extractive speech summarization aims to automatically select an indicative set of sentences from a spoken document to concisely represent the.
In-car Speech Recognition Using Distributed Microphones Tetsuya Shinde Kazuya Takeda Fumitada Itakura Center for Integrated Acoustic Information Research.
UNSUPERVISED CV LANGUAGE MODEL ADAPTATION BASED ON DIRECT LIKELIHOOD MAXIMIZATION SENTENCE SELECTION Takahiro Shinozaki, Yasuo Horiuchi, Shingo Kuroiwa.
July Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Conditional Random Fields for ASR Jeremy Morris July 25, 2006.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
AI & Machine Learning Libraries By Logan Kearsley.
Performance Comparison of Speaker and Emotion Recognition
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
Relevance Language Modeling For Speech Recognition Kuan-Yu Chen and Berlin Chen National Taiwan Normal University, Taipei, Taiwan ICASSP /1/17.
Arlindo Veiga Dirce Celorico Jorge Proença Sara Candeias Fernando Perdigão Prosodic and Phonetic Features for Speaking Styles Classification and Detection.
An i-Vector PLDA based Gender Identification Approach for Severely Distorted and Multilingual DARPA RATS Data Shivesh Ranjan, Gang Liu and John H. L. Hansen.
Network Training for Continuous Speech Recognition Author: Issac John Alphonso Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
On the relevance of facial expressions for biometric recognition Marcos Faundez-Zanuy, Joan Fabregas Escola Universitària Politècnica de Mataró (Barcelona.
Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Olivier Siohan David Rybach
Online Multiscale Dynamic Topic Models
Feature Mapping FOR SPEAKER Diarization IN NOisy conditions
Liverpool Keele Contribution.
RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION
A maximum likelihood estimation and training on the fly approach
Speaker Identification:
Network Training for Continuous Speech Recognition
SPECIAL ISSUE on Document Analysis, 5(2):1-15, 2005.
Presentation transcript:

INCORPORATING MULTIPLE-HMM ACOUSTIC MODELING IN A MODULAR LARGE VOCABULARY SPEECH RECOGNITION SYSTEM IN TELEPHONE ENVIRONMENT A. Gallardo-Antolín, J. Ferreiros, J. Macías-Guarasa, R. de Córdoba and J.M. Pardo Grupo de Politécnica de Madrid. Tecnología del Habla. Universidad Spain {jfl, macias, cordoba, SYSTEM ARCHITECTURE Previous work on this topic at EUROSPEECH'99: Flexible Large Vocabulary (up to words) Speaker independent. Isolated word Telephone speech Two stage - bottom up strategy Using Neural Networks as a novel approach to estimate preselection list length In this paper: Integrating multiple acoustic models in this system  gender specific models SUMMARY TRAINING MULTIPLE-HMM ACOUSTIC MODELS INCORPORATING MULTIPLE-HMMs IN THE RECOGNITION STAGE EXPERIMENTAL SETUP VESTEL database realistic telephone speech corpus: Training set: 5810 utterances:  46,74 % male speakers  53,26 % female speakers Test set: 1434 utterances (vocabulary independent task) Vocabulary composed of 2000, 5000 and words CONCLUSIONS The use of multiple acoustic models per phonetic unit allows increasing acoustic modeling robustness in difficult tasks. Best results are obtained using: Training stage: Independent-training. PSBU module: Single-set. LA module: Shared Costs. A relative error reduction around 25 % is achieved when using multiple-SCHMM modeling compared to the single-SCHMM system for a dictionary of words. EXPERIMENTAL RESULTS Independent-training vs. Joint-training Discrete HMMs. Dictionary of 2000 words. Combination 2 in the recognition stage. Alternatives for PSBU and LA modules Discrete HMMs. Dictionary of 2000 words. Independent-training for multiple DHMMs. Single-SCHMM vs. Multiple-SCHMM Independent-training for multiple SCHMMs. Single-set in PSBU module + Shared-costs in LA module. Dictionary of 2000, 5000 and words. Joint training (I) Each set of models is trained using all the utterances contained in the training database. A weighting function controls the influence of each utterance in the modeling of each set. We have developed two different methods for training gender-dependent models: Independent training Each set of models is trained using only the part of the training database assigned to it. Joint training (and II) P A and P B are the likelihoods for the utterance with the set of models A and B, respectively.  A and  B are the weights to be applied to the reestimation formulae in the training stage.  is an adjustment factor, which allows to assign more training data to a particular set. Alternatives for PSBU Combined-sets Phonetic strings are composed by concatenating models coming from any set.Single-set Phonetic strings are forced to be generated by only one set of models (the one that produces the best score). Alternatives for LA Shared-costs All the allophones have the same behaviour in the LA stage, even if they have been generated from different set of models. Both sets share the same confusion matrix. Set-dependent costs Cost are gender-dependent. One square confusion matrix (for “single-set” PSBU strategy) per set. A single rectangular confusion matrix (for “combined sets”).