Research & Development ICASSP' Analysis of Model Adaptation on Non-Native Speech for Multiple Accent Speech Recognition D. Jouvet & K. Bartkova France Télécom - R&D
Research & Development Multilingual Units for Modeling Pronunciation Variants – ICASSP' Overview Multiple foreign accent speech corpus Baseline native speech modeling and results Modeling non-native speech variants Phonological rules Units trained on foreign data Selection of variants Adaptation on non-native speech On all types of foreign accents Only on subsets of foreign accents Conclusion
Research & Development Multilingual Units for Modeling Pronunciation Variants – ICASSP' Multiple Foreign Accent Speech Corpus 83 French words and expressions collected over telephone Cluster Language groupTest set Originating countries (24 in total) French 94 speakers France, Belgium, Switzerland, … EsEnDe Spanish35 speakers Spain English96 speakers USA, UK, Ireland, … German113 speakers Germany, Austria Other Italian56 speakers Italy Portuguese17 speakers Portugal African50 speakers Senegal, Congo, Mali, … Arabic53 speakers Algeria, Tunisia, Marocco Turkish53 speakers Turkey Cambodian48 speakers Cambodia Asian69 speakersChina, Vietnam
Research & Development Multilingual Units for Modeling Pronunciation Variants – ICASSP' Baseline Modeling and Results Using Native Speech Models Modeling : MFCC, HMM, Gaussian mixtures, Context-dependent models Baseline M1.A1: native French acoustic units only (model M1) trained on large French data speech corpus (acoustic parameters A1) Large dispersion of recognition performances across speaker language groups (error rates: 6% for German speakers … 12% for English & Spanish speakers)
Research & Development Multilingual Units for Modeling Pronunciation Variants – ICASSP' Modeling Non-Native Speech Variants Variants Derived through Phonological Rules Vowels apertures open / close allowed:e ⇨ (e + ɛ ) Possible denasalization of nasal sounds: ɛ ̃ ⇨ ( ɛ ̃ + ɛ N), where N = n, m or ŋ Difficulty to pronounce front rounded vowel /y/ ( ⇨ /u/) & semi-vowel /Y/ ( ⇨ /w/) Application of rules Model M2 Significant improvement for many language groups (not all), but overall better
Research & Development Multilingual Units for Modeling Pronunciation Variants – ICASSP' Modeling Non-Native Speech Variants Adding Units Trained on Foreign Data Foreign standard units Standard training e.g. German units trained from German words uttered by German speakers: φ_de_DE For each French units, corresponding foreign units are added for recognition French units adapted on foreign data Mapping between French and foreign units for training, for example Paris_uk p_uk. a_uk. r_uk. i_uk. s_uk p_fr. a_fr. r_fr. i_fr. s_fr Hence, here, French units adapted on English speech material: φ_fr_UK e_sp_SP e_fr_FR e_uk_UK e_de_DE e_fr_SP e_fr_FR e_fr_UK e_fr_DE Model M3 Model M4
Research & Development Multilingual Units for Modeling Pronunciation Variants – ICASSP' Modeling Non-Native Speech Variants Adding Units Trained on Foreign Data Adding "standard foreign units" vs "French units adapted on foreign data" Better results are obtained when adding French units adapted on foreign data Improvement on non-native speech Even for languages that do not correspond to added units
Research & Development Multilingual Units for Modeling Pronunciation Variants – ICASSP' Modeling Non-Native Speech Variants Adding a Selection of Foreign Adapted Units Instead of keeping all variants (units) added for each phoneme, only the most frequently ones are kept (model M5) (statistics using force alignments on adaptation set) Degradation performances (due to added units) on French speakers smaller Improvement on language groups associated to added units smaller Better results on other language groups
Research & Development Multilingual Units for Modeling Pronunciation Variants – ICASSP' Adaptation on Non-native Speech Adaptation set: about same size as test set Exhibits similar non-native accents (same countries) Generic models M1.A1 & M2.A1 French native units without / with phonological rules Generic model M3.A1 French native units & standard foreign units Generic models M4.A1 & M5.A1 French native units & French units adapted on foreign data Accent adapted models M1.A5 & M2.A5 Accent adapted model M3.A5 Accent adapted models M4.A5 & M5.A5 Non-native speech adaptation corpus French words pronunced by foreign speakers, …
Research & Development Multilingual Units for Modeling Pronunciation Variants – ICASSP' Adaptation on Non-native Speech Adaptation using all Types of Accents Behavior of various modeling variants after all accents adaptation is similar to the behavior obtained with generic models
Research & Development Multilingual Units for Modeling Pronunciation Variants – ICASSP' Adaptation on Non-native Speech Impact of Types of Accents (1) Experiments using the best model (model M5) Reference results with generic parameters (model M5.A1) Adaptation using data from French speakers only ( model M5.A2 ) corresponds task and context adaptation Adaptation using data from limited set of accents: Spanish, English and German speakers only (model M5.A3) Adaptation using data from other types of accents: Italian, Portuguese, … and Asian speakers only (model M5.A4) And results after adaptation using all types of accents (model M5.A5)
Research & Development Multilingual Units for Modeling Pronunciation Variants – ICASSP' Adaptation on Non-native Speech Impact of Types of Accents (2) Adaptation on French speakers only ( M5.A2 ) improves on almost all accented data Best results obtained with adaptation on all types of accents ( M5.A5 )
Research & Development Multilingual Units for Modeling Pronunciation Variants – ICASSP' Adaptation on Non-native Speech Impact of Types of Accents (3) After adaptation on only a few types of accents: Es, En, De ( i.e. model M5.A3 ) Large improvement achieved on all accented data including on accents that are not present in adaptation set
Research & Development Multilingual Units for Modeling Pronunciation Variants – ICASSP' Conclusion Non-native speech recognition takes benefit of variants Application of phonological rules and introduction of units trained on foreign data Selection of variants is beneficial Adaptation on non-native speech provides important improvement for each type of modeling, and variants are still useful Adaptation on speech data representing a limited set of foreign accents is also beneficial for other types of accents