Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.

Slides:

Advertisements

Similar presentations

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.

Advertisements

Speech Recognition with Hidden Markov Models Winter 2011

Combining Heterogeneous Sensors with Standard Microphones for Noise Robust Recognition Horacio Franco 1, Martin Graciarena 12 Kemal Sonmez 1, Harry Bratt.

0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

An Overview of Machine Learning

Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.

AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION Michael L. Seltzer, Dong Yu Yongqiang Wang ICASSP 2013 Presenter : 張庭豪.

Hidden Markov Models Theory By Johan Walters (SR 2003)

Hidden Markov Model based 2D Shape Classification Ninad Thakoor 1 and Jean Gao 2 1 Electrical Engineering, University of Texas at Arlington, TX-76013,

Feature Selection, Acoustic Modeling and Adaptation SDSG REVIEW of recent WORK Technical University of Crete Speech Processing and Dialog Systems Group.

Signal Modeling for Robust Speech Recognition With Frequency Warping and Convex Optimization Yoon Kim March 8, 2000.

MPEG Audio Compression by V. Loumos. Introduction Motion Picture Experts Group (MPEG) International Standards Organization (ISO) First High Fidelity Audio.

HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR.

Speech Recognition in Noise

1 lBayesian Estimation (BE) l Bayesian Parameter Estimation: Gaussian Case l Bayesian Parameter Estimation: General Estimation l Problems of Dimensionality.

1 Speech Enhancement Wiener Filtering: A linear estimation of clean signal from the noisy signal Using MMSE criterion.

Bayesian Estimation (BE) Bayesian Parameter Estimation: Gaussian Case

Speech Technology Lab Ƅ ɜ: m ɪ ŋ ǝ m EEM4R Spoken Language Processing - Introduction Training HMMs Version 4: February 2005.

A PRESENTATION BY SHAMALEE DESHPANDE

Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.

HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Example Clustered Transformations MAP Adaptation Resources: ECE 7000:

Isolated-Word Speech Recognition Using Hidden Markov Models

1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.

1 Robust HMM classification schemes for speaker recognition using integral decode Marie Roch Florida International University.

1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.

7-Speech Recognition Speech Recognition Concepts

Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition Bing Zhang and Spyros Matsoukas BBN Technologies Present.

Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.

Codebook-based Feature Compensation for Robust Speech Recognition 2007/02/08 Shih-Hsiang Lin ( 林士翔 ) Graduate Student National Taiwan Normal University,

Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.

Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.

Informing Multisource Decoding for Robust Speech Recognition Ning Ma and Phil Green Speech and Hearing Research Group The University of Sheffield 22/04/2005.

Basics of Neural Networks Neural Network Topologies.

Speech Parameter Generation From HMM Using Dynamic Features Keiichi Tokuda, Takao Kobayashi, Satoshi Imai ICASSP 1995 Reporter: Huang-Wei Chen.

LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Definitions Random Signal Analysis (Review) Discrete Random Signals Random.

Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

Chapter 3 (part 2): Maximum-Likelihood and Bayesian Parameter Estimation Bayesian Estimation (BE) Bayesian Estimation (BE) Bayesian Parameter Estimation:

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.

Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.

PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:

Performance Comparison of Speaker and Emotion Recognition

ICASSP 2006 Robustness Techniques Survey ShihHsiang 2006.

ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.

Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.

Chapter 8. Learning of Gestures by Imitation in a Humanoid Robot in Imitation and Social Learning in Robots, Calinon and Billard. Course: Robots Learning.

RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.

Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.

EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,

January 2001RESPITE workshop - Martigny Multiband With Contaminated Training Data Results on AURORA 2 TCTS Faculté Polytechnique de Mons Belgium.

Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.

Proposed Courses. Important Notes State-of-the-art challenges in TV Broadcasting o New technologies in TV o Multi-view broadcasting o HDR imaging.

A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.

Speech Enhancement Summer 2009

EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture

Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)

CONTEXT DEPENDENT CLASSIFICATION

EE513 Audio Signals and Systems

Missing feature theory

LECTURE 15: REESTIMATION, EM AND MIXTURES

Wiener Filtering: A linear estimation of clean signal from the noisy signal Using MMSE criterion.

Speaker Identification:

Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)

Presenter: Shih-Hsiang(士翔)

Combination of Feature and Channel Compensation (1/2)

Presentation transcript:

Robust Speech recognition V. Barreaud LORIA

Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation u Inter-Speaker Variation

Robust Approaches n three categories u noise resistant features (Speech var.) u speech enhancement (Speech var. + Inter-speaker var.) u model adaptation for noise (Speech var. + Inter-speaker var.) Recognition system testing training Models Features encoding Word sequence Spk. A Spk. B

Contents n Overview u Noise resistant features u Speach enhancement u Model adaptation n Stochastic Matching n Our current work

Noise resistant features n Acoustic representation u Emphasis on less affected evidences n Auditory systems inspired models u Filter banks, Loudness curve, Lateral inhibition n Slow variation removal u Cepstrum Mean Normalization, Time derivatives n Linear Discriminative Analysis u Searches for the best parameterization

Speech enhancement n Parameter mapping u stereo data u observation subspace n Bayesian estimation u stochastic modelization of speech and noise n Template based estimation u restriction to a subspace u output is noise free u various templates and combination methods n Spectral Subtraction u noise and speech uncorrelated u slowly varying noise

Model Adaptation for noise n Decomposition of HMM or PMC u Viterbi algorithm searches in a NxM state HMM u Noise and speech simultaneously recognized u complex noises recognized n State dependant Wiener filtering u Wiener filtering in spectral domain faces non-stationary u Hmms divide speech in quasi-stationary segments u wiener filters specific to the state n Discriminative training u Classical technique trains models independently u error corrective training u minimum classification error training n Training data contamination u training set corrupted with noisy speech u depends on the test environment u lower discriminative scores Training

Stochastic Matching : Introduction n General framework n in feature space n in model space

Stochastic Matching : General framework n HMM Models  X, X training space n Y ={y 1, …, y t } observation in testing space n and Y W 

Stochastic Matching : In Feature Space n Estimation step : Auxiliary function n Maximization step

Stochastic Matching : In Feature Space (2) n Simple distorsion function n Computation of the simple bias

Stochastic Matching : In Model Space n random additive bias sequence B={b 1,…,b t } independent of speech stochastic process of mean  b and diagonal covariance  b

On-Line Frame-Synchronous Noise Compensation n Lies on stochastic matching method n Transformation parameter estimated along with optimal path. n Uses forward probabilities b1b1 b2b2 b3b3 b4b4 Sequence of observations Bias computation y2y2 y3y3 y4y4 z2z2 z3z3 z4z4 z5z5 reco Transformed observations

Theoretical framework and issue n On line frame synchronous n cascade of errors 1. Initiate bias of first frame b 0 =0 2. Compute  and then b 3. Transform next frame with b 4. Goto next frame n Classical Stochastic Matching

Viterbi Hypothesis vs Linear Combination n Viterbi Hypothesis take into account only the « most probable » state and gaussian component. n Linear combination t t+1 states

Experiments n Phone numbers in a running car n Forced Align u transcription + optimum path n Free Align u optimum path n Wild Align u no data

Perspectives n Error recovery problem u a forgetting process u a model of distorsion function u environmental clues n More elaborated transform