Non-native Speech Languages have different pronunciation spaces

Slides:



Advertisements
Similar presentations
AUTOMATIC PHONETIC ANNOTATION OF AN ORTHOGRAPHICALLY TRANSCRIBED SPEECH CORPUS Rui Amaral, Pedro Carvalho, Diamantino Caseiro, Isabel Trancoso, Luís Oliveira.
Advertisements

Building an ASR using HTK CS4706
Audio Visual Speech Recognition
Research & Development ICASSP' Analysis of Model Adaptation on Non-Native Speech for Multiple Accent Speech Recognition D. Jouvet & K. Bartkova France.
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Towards speaker and environmental robustness in ASR: the HIWIRE project A. Potamianos 1, G. Bouselmi 2, D. Dimitriadis 3, D. Fohr 2, R. Gemello 4, I. Illina.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Advances in WP2 Torino Meeting – 9-10 March
: Recognition Speech Segmentation Speech activity detection Vowel detection Duration parameters extraction Intonation parameters extraction German Italian.
Nonparametric-Bayesian approach for automatic generation of subword units- Initial study Amir Harati Institute for Signal and Information Processing Temple.
J. Kunzmann, K. Choukri, E. Janke, A. Kießling, K. Knill, L. Lamel, T. Schultz, and S. Yamamoto Automatic Speech Recognition and Understanding ASRU, December.
Development of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English: The CUHK Experience Helen Meng, Wai-Kit.
ETRW Modelling Pronunciation variation for ASR ESCA Tutorial & Research Workshop Modelling pronunciation variation for ASR INTRODUCING MULTIPLE PRONUNCIATIONS.
Advances in WP2 Nancy Meeting – 6-7 July
Detection of Recognition Errors and Out of the Spelling Dictionary Names in a Spelled Name Recognizer for Spanish R. San-Segundo, J. Macías-Guarasa, J.
Advances in WP2 Trento Meeting – January
Construction of phoneme-to-phoneme converters
LORIA Irina Illina Dominique Fohr Chania Meeting May 9-10, 2007.
HIWIRE MEETING Trento, January 11-12, 2007 José C. Segura, Javier Ramírez.
LORIA Irina Illina Dominique Fohr Christophe Cerisara Torino Meeting March 9-10, 2006.
Why is ASR Hard? Natural speech is continuous
May 20, 2006SRIV2006, Toulouse, France1 Acoustic Modeling of Accented English Speech for Large-Vocabulary Speech Recognition ATR Spoken Language Communication.
Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski.
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
Speech Recognition Application
As a conclusion, our system can perform good performance on a read speech corpus, but we will have to develop more accurate tools in order to model the.
Speech and Language Processing
Utterance Verification for Spontaneous Mandarin Speech Keyword Spotting Liu Xin, BinXi Wang Presenter: Kai-Wun Shih No.306, P.O. Box 1001,ZhengZhou,450002,
Automatic detection of microchiroptera echolocation calls from field recordings using machine learning algorithms Mark D. Skowronski and John G. Harris.
Integrated Stochastic Pronunciation Modeling Dong Wang Supervisors: Simon King, Joe Frankel, James Scobbie.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
SLTU 2014 – 4th Workshop on Spoken Language Technologies for Under-resourced Languages St. Petersburg, Russia KIT – University of the State.
Page 1 Audiovisual Speech Analysis Ouisper Project - Silent Speech Interface.
Lessons Learned Mokusei: Multilingual Conversational Interfaces Future Plans Explore language-independent approaches to speech understanding and generation.
LREC 2008, Marrakech, Morocco1 Automatic phone segmentation of expressive speech L. Charonnat, G. Vidal, O. Boëffard IRISA/Cordial, Université de Rennes.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Modeling Speech using POMDPs In this work we apply a new model, POMPD, in place of the traditional HMM to acoustically model the speech signal. We use.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level.
Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.
Robust speaking rate estimation using broad phonetic class recognition Jiahong Yuan and Mark Liberman University of Pennsylvania Mar. 16, 2010.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
ACOUSTIC-PHONETIC UNIT SIMILARITIES FOR CONTEXT DEPENDENT ACOUSTIC MODEL PORTABILITY Viet Bac Le*, Laurent Besacier*, Tanja Schultz** * CLIPS-IMAG Laboratory,
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
1 CRANDEM: Conditional Random Fields for ASR Jeremy Morris 11/21/2008.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
The HTK Book (for HTK Version 3.2.1) Young et al., 2002.
LREC 2004, 26 May 2004, Lisbon 1 Multimodal Multilingual Resources in the Subtitling Process S.Piperidis, I.Demiros, P.Prokopidis, P.Vanroose, A. Hoethker,
© 2005, it - instituto de telecomunicações. Todos os direitos reservados. Arlindo Veiga 1,2 Sara Cadeias 1 Carla Lopes 1,2 Fernando Perdigão 1,2 1 Instituto.
HMM-Based Speech Synthesis Erica Cooper CS4706 Spring 2011.
VoiceXML – Speech Recognition Yousef Rabah. VoiceXML Markup Language Dialogs Dependencies Standalone Vs. Hosted Speaker Dependent Vs. Speaker Independent.
Research & Technology Progress in the framework of the RESPITE project at DaimlerChrysler Research & Technology Dr-Ing. Fritz Class and Joan Marí Sheffield,
Automatic Pronunciation Scoring of Specific Phone Segments for Language Instruction EuroSpeech 1997 Authors: Y. Kim, H. Franco, L. Neumeyer Presenter:
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
1 A Two-pass Framework of Mispronunciation Detection & Diagnosis for Computer-aided Pronunciation Training Xiaojun Qian, Member, IEEE, Helen Meng, Fellow,
Cross-Dialectal Data Transferring for Gaussian Mixture Model Training in Arabic Speech Recognition Po-Sen Huang Mark Hasegawa-Johnson University of Illinois.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
Audio Books for Phonetics Research CatCod2008 Jiahong Yuan and Mark Liberman University of Pennsylvania Dec. 4, 2008.
H ADVANCES IN MANDARIN BROADCAST SPEECH RECOGNITION Overview Goal Build a highly accurate Mandarin speech recognizer for broadcast news (BN) and broadcast.
A NONPARAMETRIC BAYESIAN APPROACH FOR
Automatic Speech Recognition
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
Automatic Speech Recognition
Woo Kyeong Seong, Ji Hun Park, and Hong Kook Kim
Audio Books for Phonetics Research
Automatic Speech Recognition: Conditional Random Fields for ASR
Rohit Kumar *, Amit Kataria, Sanjeev Sofat
Speaker Identification:
Emre Yılmaz, Henk van den Heuvel and David A. van Leeuwen
Presentation transcript:

Non-Native Speech Recognition Using Confusion-Based Acoustic model Integration

Non-native Speech Languages have different pronunciation spaces + Speakers are used to utter & recognize the phones of their native language  Non-native speakers make pronunciation errors & replace phones by others Read speech or “inter-language words”: errors made by non-native speakers may depend on the writing of the words  Take into account the graphemes (characters)

Pronunciation modeling (1/2) Fully automated process & data-driven Needs HMM models of the SL & NL Needs non-native speech database SL HMMs Phonetic alignment Modify the HMM Models of the SL ASR system Non-native database Confusion Rules Phonetic recognition NL HMMs

Pronunciation modeling (2/2) English diphtong [aI] Confusion rules when NL is italian, spanish and greek [aI]  [a] [i] P= 0.6 [aI]  [a] [e] P= 0.4

Graphemic Constraints (1/2) Matching between graphemes and phones Example 1 : APPROACH /ah p r ow ch/ APPROACH (ah, A) (p, PP) (r, R) (ow, OA) (ch, CH) Example 2 : POSITION /p ah z ih sh ah n/ POSITION (p, P) (ah, O) (z, S) (ih, I) (sh, TI) (ah, O) (n, N) New lexicon generation : link phones to graphemes Confusion rules extraction Rules implicitly include the graphemic constraints (english phone, grapheme) → list of NL phones ex: (ah, A) → a (ah, O) → o Recognition

Graphemic Constraints (2/2) Extract the phone-grapheme associations Phonetic dictionary Trained discrete HMM sys. Training Forced alignment Phone-grapheme associations Applying the graphemic constraints Phone-grapheme associations Trained discrete HMM sys. Modified Target Lexicon, Includes phone-grapheme associations Target Lexicon Forced alignment

Experiments (1/3) HIWIRE non-native database 31 French, 20 Italian, 20 Greek & 10 Spanish 100 sentences per speaker, THALES grammar 50 first sent. for develop. / 50 last for testing 13 MFCC + Δ + ΔΔ, 128 gaussian mixtures “Pronunciation modeling” for each NL Tests of the baseline vs. PM, MLLR THALES grammar & word-loop grammar

Experiments (2/3) using THALES grammar French Italian Spanish Greek Average WER SER baseline Phonetic confusion Phonetic confusion + graphemic constarints 6.0 12.8 10.5 19.6 7.0 14.9 5.8 13.2 7.3 15.1 4.4 10.2 6.9 14.1 5.1 11.8 2.9 7.5 4.8 10.9 4.9 11.3 8.2 15.9 6.2 13.6 6.3 14.0 Baseline + MLLR Phonetic confusion + MLLR Phonetic conf. + graph. const. + MLLR 4.3 8.9 7.3 13.6 5.1 11.1 3.6 9.4 10.8 3.1 7.2 4.9 11.5 3.4 8.0 2.3 6.5 8.3 3.7 8.5 14.1 4.8 9.8 12.7 5.0 11.3

Experiments (3/3) using a “word-loop” grammar French Italian Spanish Greek Average WER SER baseline Phonetic confusion Phonetic confusion + graphemic constarints 37.7 47.9 45.5 52.0 39.9 53.5 36.7 40.0 50.7 27.3 42.1 31.3 46.2 29.5 44.5 20.3 35.1 27.1 42.0 26.2 41.9 30.5 46.5 24.3 43.0 28.1 44.2 Baseline + MLLR Phonetic confusion + MLLR Phonetic conf. + graph. const. + MLLR 28.4 39.4 34.9 46.5 32.3 48.3 28.5 31.0 32.2 42.7 23.0 36.6 25.2 40.6 24.7 40.1 18.1 31.3 22.8 37.2 25.6 41.2 25.9 39.6 21.8 38.5 24.1 39.0

Conclusion Fully automated method for non-native speech recognition, multilingual Performs slightly better than MLLR Phonetic confusion + MLLR  yet better results Graphemic constraints did not lead to enhancements : future investigations 9 more French speakers recorded Future : automatic detection of the native language of the speaker

Publications “Fully Automated Non-Native Speech Recognition Using Confusion-Based Acoustic Model Integration”. In Proc. Eurospeech/Interspeech, Lisboa, September 2005. “Fully Automated Non-Native Speech Recognition Using Confusion-Based Acoustic Model Integration and Graphemic Constraints’’. In Proc. ICASSP, Toulouse, France, May 2006. “Reconnaissance de parole non native fondée sur l'utilisation de confusion phonétique et de contraintes graphèmiques’’. In Proc. JEP06, Saint-Malo, France, June 2006. “Multilingual Non-Native Speech Recognition using Phonetic Confusion-Based Acoustic Model Modification and Graphemic Constraints”. In Proc. ICSLP, Pittsbergs, USA, September 2006. Writing of journal article for SpeechCom.