Download presentation
1
Non-Native Speech Recognition Using Confusion-Based Acoustic model Integration
2
Non-native Speech Languages have different pronunciation spaces
+ Speakers are used to utter & recognize the phones of their native language Non-native speakers make pronunciation errors & replace phones by others Read speech or “inter-language words”: errors made by non-native speakers may depend on the writing of the words Take into account the graphemes (characters)
3
Pronunciation modeling (1/2)
Fully automated process & data-driven Needs HMM models of the SL & NL Needs non-native speech database SL HMMs Phonetic alignment Modify the HMM Models of the SL ASR system Non-native database Confusion Rules Phonetic recognition NL HMMs
4
Pronunciation modeling (2/2)
English diphtong [aI] Confusion rules when NL is italian, spanish and greek [aI] [a] [i] P= 0.6 [aI] [a] [e] P= 0.4
5
Graphemic Constraints (1/2)
Matching between graphemes and phones Example 1 : APPROACH /ah p r ow ch/ APPROACH (ah, A) (p, PP) (r, R) (ow, OA) (ch, CH) Example 2 : POSITION /p ah z ih sh ah n/ POSITION (p, P) (ah, O) (z, S) (ih, I) (sh, TI) (ah, O) (n, N) New lexicon generation : link phones to graphemes Confusion rules extraction Rules implicitly include the graphemic constraints (english phone, grapheme) → list of NL phones ex: (ah, A) → a (ah, O) → o Recognition
6
Graphemic Constraints (2/2)
Extract the phone-grapheme associations Phonetic dictionary Trained discrete HMM sys. Training Forced alignment Phone-grapheme associations Applying the graphemic constraints Phone-grapheme associations Trained discrete HMM sys. Modified Target Lexicon, Includes phone-grapheme associations Target Lexicon Forced alignment
7
Experiments (1/3) HIWIRE non-native database
31 French, 20 Italian, 20 Greek & 10 Spanish 100 sentences per speaker, THALES grammar 50 first sent. for develop. / 50 last for testing 13 MFCC + Δ + ΔΔ, 128 gaussian mixtures “Pronunciation modeling” for each NL Tests of the baseline vs. PM, MLLR THALES grammar & word-loop grammar
8
Experiments (2/3) using THALES grammar French Italian Spanish Greek
Average WER SER baseline Phonetic confusion Phonetic confusion + graphemic constarints 6.0 12.8 10.5 19.6 7.0 14.9 5.8 13.2 7.3 15.1 4.4 10.2 6.9 14.1 5.1 11.8 2.9 7.5 4.8 10.9 4.9 11.3 8.2 15.9 6.2 13.6 6.3 14.0 Baseline + MLLR Phonetic confusion + MLLR Phonetic conf. + graph. const. + MLLR 4.3 8.9 7.3 13.6 5.1 11.1 3.6 9.4 10.8 3.1 7.2 4.9 11.5 3.4 8.0 2.3 6.5 8.3 3.7 8.5 14.1 4.8 9.8 12.7 5.0 11.3
9
Experiments (3/3) using a “word-loop” grammar French Italian Spanish
Greek Average WER SER baseline Phonetic confusion Phonetic confusion + graphemic constarints 37.7 47.9 45.5 52.0 39.9 53.5 36.7 40.0 50.7 27.3 42.1 31.3 46.2 29.5 44.5 20.3 35.1 27.1 42.0 26.2 41.9 30.5 46.5 24.3 43.0 28.1 44.2 Baseline + MLLR Phonetic confusion + MLLR Phonetic conf. + graph. const. + MLLR 28.4 39.4 34.9 46.5 32.3 48.3 28.5 31.0 32.2 42.7 23.0 36.6 25.2 40.6 24.7 40.1 18.1 31.3 22.8 37.2 25.6 41.2 25.9 39.6 21.8 38.5 24.1 39.0
10
Conclusion Fully automated method for non-native speech recognition, multilingual Performs slightly better than MLLR Phonetic confusion + MLLR yet better results Graphemic constraints did not lead to enhancements : future investigations 9 more French speakers recorded Future : automatic detection of the native language of the speaker
11
Publications “Fully Automated Non-Native Speech Recognition Using Confusion-Based Acoustic Model Integration” In Proc. Eurospeech/Interspeech, Lisboa, September 2005. “Fully Automated Non-Native Speech Recognition Using Confusion-Based Acoustic Model Integration and Graphemic Constraints’’. In Proc. ICASSP, Toulouse, France, May 2006. “Reconnaissance de parole non native fondée sur l'utilisation de confusion phonétique et de contraintes graphèmiques’’. In Proc. JEP06, Saint-Malo, France, June 2006. “Multilingual Non-Native Speech Recognition using Phonetic Confusion-Based Acoustic Model Modification and Graphemic Constraints”. In Proc. ICSLP, Pittsbergs, USA, September 2006. Writing of journal article for SpeechCom.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.