Robust Speech recognition V. Barreaud LORIA
Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation u Inter-Speaker Variation
Robust Approaches n three categories u noise resistant features (Speech var.) u speech enhancement (Speech var. + Inter-speaker var.) u model adaptation for noise (Speech var. + Inter-speaker var.) Recognition system testing training Models Features encoding Word sequence Spk. A Spk. B
Contents n Overview u Noise resistant features u Speach enhancement u Model adaptation n Stochastic Matching n Our current work
Noise resistant features n Acoustic representation u Emphasis on less affected evidences n Auditory systems inspired models u Filter banks, Loudness curve, Lateral inhibition n Slow variation removal u Cepstrum Mean Normalization, Time derivatives n Linear Discriminative Analysis u Searches for the best parameterization
Speech enhancement n Parameter mapping u stereo data u observation subspace n Bayesian estimation u stochastic modelization of speech and noise n Template based estimation u restriction to a subspace u output is noise free u various templates and combination methods n Spectral Subtraction u noise and speech uncorrelated u slowly varying noise
Model Adaptation for noise n Decomposition of HMM or PMC u Viterbi algorithm searches in a NxM state HMM u Noise and speech simultaneously recognized u complex noises recognized n State dependant Wiener filtering u Wiener filtering in spectral domain faces non-stationary u Hmms divide speech in quasi-stationary segments u wiener filters specific to the state n Discriminative training u Classical technique trains models independently u error corrective training u minimum classification error training n Training data contamination u training set corrupted with noisy speech u depends on the test environment u lower discriminative scores Training
Stochastic Matching : Introduction n General framework n in feature space n in model space
Stochastic Matching : General framework n HMM Models X, X training space n Y ={y 1, …, y t } observation in testing space n and Y W
Stochastic Matching : In Feature Space n Estimation step : Auxiliary function n Maximization step
Stochastic Matching : In Feature Space (2) n Simple distorsion function n Computation of the simple bias
Stochastic Matching : In Model Space n random additive bias sequence B={b 1,…,b t } independent of speech stochastic process of mean b and diagonal covariance b
On-Line Frame-Synchronous Noise Compensation n Lies on stochastic matching method n Transformation parameter estimated along with optimal path. n Uses forward probabilities b1b1 b2b2 b3b3 b4b4 Sequence of observations Bias computation y2y2 y3y3 y4y4 z2z2 z3z3 z4z4 z5z5 reco Transformed observations
Theoretical framework and issue n On line frame synchronous n cascade of errors 1. Initiate bias of first frame b 0 =0 2. Compute and then b 3. Transform next frame with b 4. Goto next frame n Classical Stochastic Matching
Viterbi Hypothesis vs Linear Combination n Viterbi Hypothesis take into account only the « most probable » state and gaussian component. n Linear combination t t+1 states
Experiments n Phone numbers in a running car n Forced Align u transcription + optimum path n Free Align u optimum path n Wild Align u no data
Perspectives n Error recovery problem u a forgetting process u a model of distorsion function u environmental clues n More elaborated transform