Advances in WP2 Trento Meeting – January 2007 www.loquendo.com.

Slides:



Advertisements
Similar presentations
APPLICATIONS OF ANN IN MICROWAVE ENGINEERING.
Advertisements

Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution.
Advances in WP1 Trento Meeting January
Combining Heterogeneous Sensors with Standard Microphones for Noise Robust Recognition Horacio Franco 1, Martin Graciarena 12 Kemal Sonmez 1, Harry Bratt.
Advanced Speech Enhancement in Noisy Environments
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Advances in WP2 Torino Meeting – 9-10 March
AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION Michael L. Seltzer, Dong Yu Yongqiang Wang ICASSP 2013 Presenter : 張庭豪.
Advances in WP2 Nancy Meeting – 6-7 July
Advances in WP1 Turin Meeting – 9-10 March
Development of protocols WP4 – T4.2 TRT Nancy review July 6 th & 7 th 2006.
Model-Based Fusion of Bone and Air Sensors for Speech Enhancement and Robust Speech Recognition John Hershey, Trausti Kristjansson, Zhengyou Zhang, Alex.
Advances in WP1 Nancy Meeting – 6-7 July
HIWIRE MEETING Nancy, July 6-7, 2006 José C. Segura, Ángel de la Torre.
HIWIRE Progress Report Chania, May 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.
HIWIRE MEETING Chania, May 10-11, 2007 José C. Segura.
Development of protocols WP4 – T4.2 Torino, March 9 th -10 th 2006.
Goals of Adaptive Signal Processing Design algorithms that learn from training data Algorithms must have good properties: attain good solutions, simple.
VESTEL database realistic telephone speech corpus:  PRNOK5TR: 5810 utterances in the training set  PERFDV: 2502 utterances in testing set 1 (vocabulary.
Advances in WP2 Chania Meeting – May
VARIABLE PRESELECTION LIST LENGTH ESTIMATION USING NEURAL NETWORKS IN A TELEPHONE SPEECH HYPOTHESIS-VERIFICATION SYSTEM J. Macías-Guarasa, J. Ferreiros,
HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.
LORIA Irina Illina Dominique Fohr Chania Meeting May 9-10, 2007.
Advances in WP1 and WP2 Paris Meeting – 11 febr
HIWIRE MEETING Trento, January 11-12, 2007 José C. Segura, Javier Ramírez.
HIWIRE Progress Report – July 2006 Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos Technical University.
LORIA Irina Illina Dominique Fohr Christophe Cerisara Torino Meeting March 9-10, 2006.
Advances in WP1 Chania Meeting – May
1 New Technique for Improving Speech Intelligibility for the Hearing Impaired Miriam Furst-Yust School of Electrical Engineering Tel Aviv University.
HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March Torino.
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
Database Construction for Speech to Lip-readable Animation Conversion Gyorgy Takacs, Attila Tihanyi, Tamas Bardi, Gergo Feldhoffer, Balint Srancsik Peter.
Introduction to Automatic Speech Recognition
Soft Computing Colloquium 2 Selection of neural network, Hybrid neural networks.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
7-Speech Recognition Speech Recognition Concepts
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
Adaptive Methods for Speaker Separation in Cars DaimlerChrysler Research and Technology Julien Bourgeois
An Evaluation of Many-to-One Voice Conversion Algorithms with Pre-Stored Speaker Data Sets Daisuke Tani, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari.
Algoritmi e Programmazione Avanzata
1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
In-car Speech Recognition Using Distributed Microphones Tetsuya Shinde Kazuya Takeda Fumitada Itakura Center for Integrated Acoustic Information Research.
UNSUPERVISED CV LANGUAGE MODEL ADAPTATION BASED ON DIRECT LIKELIHOOD MAXIMIZATION SENTENCE SELECTION Takahiro Shinozaki, Yasuo Horiuchi, Shingo Kuroiwa.
Learning Long-Term Temporal Feature in LVCSR Using Neural Networks Barry Chen, Qifeng Zhu, Nelson Morgan International Computer Science Institute (ICSI),
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans.
Performance Comparison of Speaker and Emotion Recognition
Copyright © 2013 by Educational Testing Service. All rights reserved. Evaluating Unsupervised Language Model Adaption Methods for Speaking Assessment ShaSha.
PHASE-BASED DUAL-MICROPHONE SPEECH ENHANCEMENT USING A PRIOR SPEECH MODEL Guangji Shi, M.A.Sc. Ph.D. Candidate University of Toronto Research Supervisor:
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences Recurrent Neural Network-based Language Modeling for an Automatic.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Jon Barker, Ricard Marxer, University of Sheffield Emmanuel Vincent, Inria Shinji Watanabe, MERL ASRU 2015, Scottsdale The 3 rd CHIME Speech Separation.
January 2001RESPITE workshop - Martigny Multiband With Contaminated Training Data Results on AURORA 2 TCTS Faculté Polytechnique de Mons Belgium.
Survey of Robust Speech Techniques in ICASSP 2009 Shih-Hsiang Lin ( 林士翔 ) 1Survey of Robustness Techniques in ICASSP 2009.
Speech Enhancement based on
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Mr. Darko Pekar, Speech Morphing Inc.
Speaker-Targeted Audio-Visual Models for Speech Recognition in Cocktail-Party Environments Good morning, My name is Guan-Lin Chao, from Carnegie Mellon.
RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION
network of simple neuron-like computing elements
3. Feedforward Nets (1) Architecture
Speaker Identification:
3. Adversarial Teacher-Student Learning (AT/S)
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

Advances in WP2 Trento Meeting – January

2 Activities on WP2 since last meeting Focus on WP1 (PEQ), WP3 (mobile platform) and WP4 (assessment) Test of adaptation on a project corpus: –Hiwire Noisy Non-Native Corpus

3 LHN Adaptation Output layer …. Input layer 1 st hidden layer 2 nd hidden layer Emission Probabilities Acoustic phonetic Units Speech Signal parameters …. Speaker Independent MLP SI-MLP LHN

4 LHN Training The global SI-MLP+LHN system is trained with vocal material from the target speaker; The LHN is initialized with an identity matrix; LHN weights are trained with error back- propagation through the last layer of weights; The original NN weights are kept frozen

5 Hiwire Noisy Corpus Recorded in cockpit simulator with two noise levels Microphone Array + Beamforming (ITC) 5 non-native speakers. Each speaker has pronounced 1 list of 100 sentences. Sentences from the Hiwire Fixed-Demo grammar

6 Experimental conditions Starting models: -standard Loquendo ASR EN-US -Telephone models (8 kHz) -Training set: LDC Macrophone Adaptation: first 50 utterances of each speaker Test:last 50 utterances of each speaker LM: Hiwire grammar (134 words voc.) Signal proc.: down-sampling to 8 kHz

7 Results on Hiwire Noisy corpus (High noise ) Recognition model: ANN/HMM Adaptation Model: LIN - LHN SpeakerDefault models WA Adapt LINAdapt LHNAdapt LIN+LHN WAER %WAER %WAER % spk spk spk spk spk Average

8 Results on Hiwire Noisy corpus (Low noise ) Recognition model: ANN/HMM Adaptation Model: LIN - LHN SpeakerDefault models WA Adapted LINAdapted LHNAdapted LIN+LHN WAER %WAER %WAER % spk spk spk spk spk Average

9 Discussion In the case of Hiwire Noisy DB there are 3 main problems: –Noise level; –Non-Native Speakers –Channel: far-field microphone array + beamforming If the WA of the default models is too low (~20-30%) adaptation is unable to improve because too many segmentation errors are present in the adaptation material If the WA of the default models is acceptable (> 40%) adaptation can improve performances On this corpus, where the channel + noise component is preponderant, LIN is in some cases better than LHN The combination LIN+LHN is always better that the single techniques

10 Workplan Selection of suitable benchmark databases (m6) Baseline set-up for the selected databases (m8) LIN adaptation method implemented and experimented on the benchmarks (m12) Experimental results on Hiwire database with LIN (m18) Innovative NN adaptation methods and algorithms for acoustic modeling and experimental results (m21) Further advances on new adaptation methods (m24) Unsupervised Adaptation: algorithms and experimentation (m33)