Algoritmi e Programmazione Avanzata

Slides:

Advertisements

Similar presentations

Zhijie Yan, Qiang Huo and Jian Xu Microsoft Research Asia

Advertisements

Research & Development ICASSP' Analysis of Model Adaptation on Non-Native Speech for Multiple Accent Speech Recognition D. Jouvet & K. Bartkova France.

Combining Heterogeneous Sensors with Standard Microphones for Noise Robust Recognition Horacio Franco 1, Martin Graciarena 12 Kemal Sonmez 1, Harry Bratt.

Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.

Advances in WP2 Torino Meeting – 9-10 March

Improved Neural Network Based Language Modelling and Adaptation J. Park, X. Liu, M.J.F. Gales and P.C. Woodland 2010 INTERSPEECH Bang-Xuan Huang Department.

Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh June 2005.

Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections

Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.

AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION Michael L. Seltzer, Dong Yu Yongqiang Wang ICASSP 2013 Presenter : 張庭豪.

Advances in WP2 Nancy Meeting – 6-7 July

Machine Learning Neural Networks

Advances in WP1 Turin Meeting – 9-10 March

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Advances in WP2 Trento Meeting – January

Text Independent Speaker Recognition with Added Noise Jason Cardillo & Raihan Ali Bashir April 11, 2005.

Speaker Adaptation for Vowel Classification

Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.

Chapter 6: Multilayer Neural Networks

Advances in WP1 and WP2 Paris Meeting – 11 febr

MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Why is ASR Hard? Natural speech is continuous

CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

May 20, 2006SRIV2006, Toulouse, France1 Acoustic Modeling of Accented English Speech for Large-Vocabulary Speech Recognition ATR Spoken Language Communication.

Introduction to Automatic Speech Recognition

1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.

Speech and Language Processing

© N. Kasabov Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering, MIT Press, 1996 INFO331 Machine learning. Neural networks. Supervised.

Neural Networks Chapter 6 Joost N. Kok Universiteit Leiden.

Appendix B: An Example of Back-propagation algorithm

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Presented by: Fang-Hui Chu Boosting HMM acoustic models in large vocabulary speech recognition Carsten Meyer, Hauke Schramm Philips Research Laboratories,

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.

Handing Uncertain Observations in Unsupervised Topic-Mixture Language Model Adaptation Ekapol Chuangsuwanich 1, Shinji Watanabe 2, Takaaki Hori 2, Tomoharu.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

Learning Long-Term Temporal Feature in LVCSR Using Neural Networks Barry Chen, Qifeng Zhu, Nelson Morgan International Computer Science Institute (ICSI),

1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.

Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.

Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.

Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.

1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.

ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.

Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.

Research & Technology Progress in the framework of the RESPITE project at DaimlerChrysler Research & Technology Dr-Ing. Fritz Class and Joan Marí Sheffield,

Statistical Models for Automatic Speech Recognition Lukáš Burget.

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

January 2001RESPITE workshop - Martigny Multiband With Contaminated Training Data Results on AURORA 2 TCTS Faculté Polytechnique de Mons Belgium.

A Hybrid Model of HMM and RBFN Model of Speech Recognition 길이만, 김수연, 김성호, 원윤정, 윤아림 한국과학기술원 응용수학전공.

APPLICATIONS OF DIRICHLET PROCESS MIXTURES TO SPEAKER ADAPTATION Amir Harati and Joseph PiconeMarc Sobel Institute for Signal and Information Processing,

Christoph Prinz / Automatic Speech Recognition Research Progress Hits the Road.

A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.

Qifeng Zhu, Barry Chen, Nelson Morgan, Andreas Stolcke ICSI & SRI

Conditional Random Fields for ASR

Statistical Models for Automatic Speech Recognition

RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION

Voice conversion using Artificial Neural Networks

Speech Processing Speech Recognition

3. Applications to Speaker Verification

CRANDEM: Conditional Random Fields for ASR

Statistical Models for Automatic Speech Recognition

Automatic Speech Recognition: Conditional Random Fields for ASR

CONTEXT DEPENDENT CLASSIFICATION

Combination of Feature and Channel Compensation (1/2)

Presentation transcript:

Algoritmi e Programmazione Avanzata Adapting Hybrid ANN/HMM to Speech Variations Stefano Scanzio, Pietro Laface Politecnico di Torino Dario Albesano, Roberto Gemello, and Franco Mana Introduzione alla ricorsione

Acoustic Model Adaptation Algoritmi e Programmazione Avanzata Acoustic Model Adaptation Adaptation tasks Linear Input Network Linear Hidden Network Catastrophic forgetting Conservative Training Results on several adaptation tasks ICASSP 2006 Introduzione alla ricorsione

Acoustic model adaptation Algoritmi e Programmazione Avanzata Acoustic model adaptation Specific speaker Speaking style (spontaneous, regional accents) Audio channel (telephone, cellular, microphone) Environment (car, office, …) Specific vocabulary Voice Application ASR Data Log Task independent models Adapted Models ANN Adaptation ICASSP 2006 Introduzione alla ricorsione

Linear Input Network adaptation Algoritmi e Programmazione Avanzata Linear Input Network adaptation Acoustic phonetic units Emission Probabilities …. Speaker/Task Independent MLP …. …. LIN …. Input layer Speech parameters ICASSP 2006 Introduzione alla ricorsione

Linear Hidden Network - LHN Algoritmi e Programmazione Avanzata Linear Hidden Network - LHN Acoustic phonetic units Emission Probabilities …. LHN Hidden layer 2 …. Hidden layer 1 …. …. Input layer Speech parameters ICASSP 2006 Introduzione alla ricorsione

Catastrophic forgetting Algoritmi e Programmazione Avanzata Catastrophic forgetting Acquiring new information can damage previously learned information if the new data that do not adequately represent the knowledge included in the original training data This effect is evident when adaptation data do not contain examples for a subset of the output classes. Problem is more severe in ANN framework than in the Gaussian Mixture HMM framework ICASSP 2006 Introduzione alla ricorsione

Catastrophic forgetting Algoritmi e Programmazione Avanzata Catastrophic forgetting Back-propagation algorithm penalizes classes with no adaptation examples setting their target value to zero for every adaptation frame Thus, during adaptation, the weights of the ANN will be biased to favor the activations of the classes with samples in the adaptation set to weaken the other classes. ICASSP 2006 Introduzione alla ricorsione

Algoritmi e Programmazione Avanzata 16-classes training 20 x 2 hidden nodes 2 input nodes 16 output nodes 2500 patterns per class 6 7 The adaptation set includes 5000 patterns belonging only to classes 6 and 7 Error Rate : 1.5% Error Rate : 6.7% Total Error Rate: 4.1% 7 6 ICASSP 2006 Introduzione alla ricorsione

Algoritmi e Programmazione Avanzata Adaptation of 2 Classes 6 7 Error Rate : 0% Error Rate : 2.0% Total Error Rate: 16.9% 7 6 ICASSP 2006 Introduzione alla ricorsione

Conservative Training target assignment policy Algoritmi e Programmazione Avanzata Conservative Training target assignment policy Standard target assignment policy Posterior probability computed using the original network 0.03 0.00 0.95 0.00 0.02 0.00 0.00 1.00 0.00 0.00 M1 P1 P2 P3 M2 P2 is the class corresponding to the current input frame Px: class in the adaptation set Mx: missing class ICASSP 2006 Introduzione alla ricorsione

Algoritmi e Programmazione Avanzata Adaptation of 2 classes Standard adaptation Conservative Training adaptation 7 6 Error Rate : 0% Error Rate : 2% Total Error Rate: 16.9% Error Rate : 2.2% Error Rate : 5.2% Total Error Rate: 10.2% 6 7 6 7 ICASSP 2006 Introduzione alla ricorsione

Algoritmi e Programmazione Avanzata Adaptation tasks Application data adaptation: Directory Assistance 9325 Italian city names 53713 training + 3917 test utterances Vocabulary adaptation: Command words 30 command words 6189 training + 3094 test utterances Channel-Environment adaptation: Aurora-3 2951 training + 654 test utterances Speaker adaptation: WSJ0 8 speakers, 16KHz 40 test + 40 train sentences ICASSP 2006 Introduzione alla ricorsione

Results on different tasks (%WER) Algoritmi e Programmazione Avanzata Results on different tasks (%WER) Adaptation Task Adaptation Method Application Directory Assistance Vocabulary Command Words Channel-Environment Aurora-3 CH1 No adaptation 14.6 3.8 24.0 LIN 11.2 3.4 11.0 LIN + CT 12.4 15.3 LHN 9.6 2.1 9.8 LHN + CT 10.1 2.3 10.4 ICASSP 2006 Introduzione alla ricorsione

Mitigation of Catastrophic Forgetting using Conservative Training Algoritmi e Programmazione Avanzata Mitigation of Catastrophic Forgetting using Conservative Training Tests using adapted models on Italian continuous speech (% WER) Models Adapted on Application Directory Assistance Vocabulary Command Words Channel-Environment Aurora-3 CH1 Adaptation Method LIN 36.3 42.7 108.6 LIN + CT 36.5 35.2 42.1 LHN 40.6 63.7 152.1 LHN + CT 40.7 45.3 44.2 No Adaptation 29.3 ICASSP 2006 Introduzione alla ricorsione

Networks used in Speaker Adaptation Task Algoritmi e Programmazione Avanzata Networks used in Speaker Adaptation Task STD (Standard) 2 hidden layer hybrid MLP-HMM model 273 input features (39 parameters and 7 context frames) IMP (Improved) Uses a wider input window spanning a time context of 25 frames Includes an additional hidden layer ICASSP 2006 Introduzione alla ricorsione

Results on WSJ0 Speaker Adaptation Task Algoritmi e Programmazione Avanzata Results on WSJ0 Speaker Adaptation Task Net type Adaptation method Trigram LM STD No adaptation 8.4 LIN 7.9 LIN+CT 7.1 LHN+CT 6.6 LIN+LHN+CT 6.3 IMP 6.5 5.6 5.0 ICASSP 2006 Introduzione alla ricorsione

Algoritmi e Programmazione Avanzata Conclusions LHN adaptation outperforms LIN adaptation Linear transformations at different levels produce different positive effects LIN+LHN performs better than LHN In adaptation tasks with missing classes, Conservative Training reduces the catastrophic forgetting effect, preserving the performance on another generic task improve the performance in speaker adaptation with few available sentences ICASSP 2006 Introduzione alla ricorsione

Algoritmi e Programmazione Avanzata Weight merging O Hidden 2 LIN I Hidden 1 ICASSP 2006 Introduzione alla ricorsione

Conservative Training (CT) Algoritmi e Programmazione Avanzata Conservative Training (CT) For each observation frame: 1) Set the target value for each class that has no (few) adaptation data to its posterior probability computed by the original network 2) Set to zero the target value for a class that has adaptation data, but does not correspond to the input frame 3) Set the target value for the class corresponding to the input frame to 1 minus the sum of the posterior probabilities assigned according to rule 1) ICASSP 2006 Introduzione alla ricorsione

Conclusions on LHN LHN outperform LIN Linear transformations at different levels produce different positive effects LIN+LHN performs better than LHN For continuous speech, the wide-input IMP network is better than the STD one ICASSP 2006