Download presentation
Presentation is loading. Please wait.
1
HIWIRE MEETING Athens, November 3-4, 2005 José C. Segura, Ángel de la Torre
2
2 HIWIRE Meeting – Athens, 3 - 4 November, 2005 Schedule HIWIRE database evaluations Non-linear feature normalization ECDF segmental implementation Parametric equalization Robust VAD Bispectrum-based VAD Model-based feature compensation VTS results on AURORA4 Including uncertainty caused by noise
3
3 HIWIRE Meeting – Athens, 3 - 4 November, 2005 HIWIRE database evaluations PARAMETERS: MFCC_0_D_A_Z (39 component) MODELS: TIMIT: 46 phone models / 3 states / 128 Gaussians (17.664 G) WSJ16k: 16.825 triphones / 3.608 tied-states / 6 Gaussians (21.648 G) WSJ16kFon: 40 phone models / 3 states / 128 Gaussians (15.360 G) ADAPTATION: MLLR: 32 regression classes / 50 adaptation utterances GRAMMAR: LORIA & Word-Loop MODIFICATIONS: Some transcriptions have been modified to match the grammar definition
4
4 HIWIRE Meeting – Athens, 3 - 4 November, 2005 Transcription modifications BEGIN { lista = LISTA; nfrase = 0; } { linea=$0; gsub("-","_",linea); gsub("Due_to_","Due_to ",linea); gsub("Mayday_Mayday","Mayday Mayday",linea); gsub("Pan_Pan","Pan Pan",linea); gsub("three hundred twenty","three_hundred_twenty",linea); gsub("one hundred sixty","one_hundred_sixty",linea); printf("%s\n",tolower(linea)); nfrase = nfrase+1; }
5
5 HIWIRE Meeting – Athens, 3 - 4 November, 2005 HIWIRE database results RESULTS WITHOUT ADAPTATION (WER) RESULTS WITH MLLR (WER)
6
6 HIWIRE Meeting – Athens, 3 - 4 November, 2005 Schedule HIWIRE database evaluations Non-linear feature normalization ECDF segmental implementation Parametric equalization Robust VAD Bispectrum-based VAD Model-based feature compensation VTS results on AURORA4 Including uncertainty caused by noise
7
7 HIWIRE Meeting – Athens, 3 - 4 November, 2005 ECDF segmental implementation ECDF segmental implementation Provided LOQUENDO with a reference “C” implementation of segmental Gaussian transformation to be tested within LOQUENDO recognizer Current work Nonlinear feature transformation with a clean reference to avoid the problem of system retraining
8
8 HIWIRE Meeting – Athens, 3 - 4 November, 2005 HEQ limitations Influence of relative amount of silence in utterances With a parametric model, a more robust equalization can be obtained Parametric Equalization (1) PARAMETRIC NONLINEAR FEATURE EQUALIZATION FOR ROBUST SPEECH RECOGNITION (submitted ICASSP’06)
9
9 HIWIRE Meeting – Athens, 3 - 4 November, 2005 Parametric Equalization (2) CLASS-DEPENDENT LINEAR EQUALIZATION SOFT DECISSION VAD (two-class Gaussian classifier on C 0 ) NONLINEAR INTERPOLATION
10
10 HIWIRE Meeting – Athens, 3 - 4 November, 2005 Parametric Equalization (3)
11
11 HIWIRE Meeting – Athens, 3 - 4 November, 2005 Parametric Equalization (4) In comparison with HEQ, PEQ transformations are smoother For C 0 a monotonic transformation is obtained For other coefficients, the interpolated transformation is not monotonic
12
12 HIWIRE Meeting – Athens, 3 - 4 November, 2005 Parametric Equalization (5) BASE MFCC_0_D_A_Z (39 component) HEQ Quantile based CDF-transformation Clean reference Implemented over MFCC_0 / CMS and regressions computed after HEQ AFE Standard implementation PEQ Clean reference Implemented over MFCC_0 / CMS and regressions computed after PEQ
13
13 HIWIRE Meeting – Athens, 3 - 4 November, 2005 Parametric Equalization (6) Current work Development of an on-line version Relax the diagonal covariance assumption Investigate the normalization of dynamic features Using a more detailed model of speech frames (i.e. More than one Gaussian)
14
14 HIWIRE Meeting – Athens, 3 - 4 November, 2005 Schedule HIWIRE database evaluations Non-linear feature normalization ECDF segmental implementation (LOQ) Parametric equalization Robust VAD Bispectrum-based VAD Model-based feature compensation VTS results on AURORA4 Including uncertainty caused by noise
15
15 HIWIRE Meeting – Athens, 3 - 4 November, 2005 Bispectrum-based VAD (1) Motivations: Ability of higher order statistics to detect signals in noise Polyspectra methods rely on an a priori knowledge of the input processes Issues to be addressed: Computationally expensive Variance of the bispectrum estimators is much higher than that of power spectral estimators for identical data record size Solution: Integrated bispectrum J. K. Tugnait, “Detection of non-Gaussian signals using integrated polyspectrum,” IEEE Trans. on Signal Processing, vol. 42, no. 11, pp. 3137–3149, 1994. Computationally efficient and reduced variance statistical test based on the integrated polyspectra Detection of an unknown random, stationary, non-Gaussian signal in Gaussian noise
16
16 HIWIRE Meeting – Athens, 3 - 4 November, 2005 Bispectrum-based VAD (2) Integrated bispectrum: Defined as a cross spectrum between the signal and its square, and therefore, it is a function of a single frequency variable Benefits: Its computation as a cross spectrum leads to significant computational savings The variance of the estimator is of the same order as that of the power spectrum estimator Properties Bispectrum of a Gaussian process is identically zero, its integrated bispectrum is as well
17
17 HIWIRE Meeting – Athens, 3 - 4 November, 2005 Two alternatives explored for formulating the decision rule: Estimation by block averaging: MO-LRT Given a set of N= 2m+1 consecutive observations: Bispectrum-based VAD (3)
18
18 HIWIRE Meeting – Athens, 3 - 4 November, 2005 Bispectrum-based VAD (4) Likelihoods Variances
19
19 HIWIRE Meeting – Athens, 3 - 4 November, 2005 Bispectrum-based VAD results (1)
20
20 HIWIRE Meeting – Athens, 3 - 4 November, 2005 Bispectrum-based VAD results (2)
21
21 HIWIRE Meeting – Athens, 3 - 4 November, 2005 Bispectrum-based VAD results (3)
22
22 HIWIRE Meeting – Athens, 3 - 4 November, 2005 Schedule HIWIRE database evaluations Non-linear feature normalization ECDF segmental implementation (LOQ) Parametric equalization Robust VAD Bispectrum-based VAD Model-based feature compensation VTS results on AURORA4 Including uncertainty caused by noise
23
23 HIWIRE Meeting – Athens, 3 - 4 November, 2005 Schedule Model-based feature compensation VTS: results on AURORA4 VTS formulation VTS vs non linear feature normalization procedures VTS results on AURORA 4 Including uncertainty caused by noise Including uncertainty in noise compensation Wiener filtering + uncertainty: results on Aurora 2 Wiener filtering + uncertainty: results on Aurora 4 VTS + uncertainty: formulation Numerical integration of probabilities: formulation
24
24 HIWIRE Meeting – Athens, 3 - 4 November, 2005 VTS formulation VTS: Vector Taylor Series approach to remove additive (and channel) noise References: P.J. Moreno. “Speech recognition in noisy environments” Ph.D. Thesis, Carnegie-Mellon University, Pittsburgh, Pensilvania, Apr. 1996. A. de la Torre. “Técnicas de mejora de la representación en los sistemas de reconocimiento automático del habla” Ph.D. Thesis, University of Granada, Spain, Apr. 1999.
25
25 HIWIRE Meeting – Athens, 3 - 4 November, 2005 VTS formulation VTS provides an estimation of the clean speech in a statistical framework: Log-FBO domain, assumed additive noise: Effect of noise described using the “correction function” g():
26
26 HIWIRE Meeting – Athens, 3 - 4 November, 2005 Auxiliary functions f() and h(): 1st and 2nd derivatives: VTS provides estimation of noisy-speech Gaussian given the clean- speech and the noise Gaussians: Noisy-speech Gaussian obtained with the expected values: VTS formulation
27
27 HIWIRE Meeting – Athens, 3 - 4 November, 2005 VTS formulation Noisy-speech Gaussian: formulas: Models for noise and clean speech:
28
28 HIWIRE Meeting – Athens, 3 - 4 November, 2005 VTS formulation Model for clean speech provides the model for noisy speech, and also P(k|y) (posterior probability of each Gaussian): Estimation of clean speech:
29
29 HIWIRE Meeting – Athens, 3 - 4 November, 2005 VTS vs non-linear feature normalization VTS: Statistical framework: Model for noise in log-FBO domain: 1 Gaussian PDF Model for clean-speech in log-FBO domain: Gaussian mixture Noise assumed to be additive in FBO domain Accurate description of noise process ACCURATE COMPENSATION Non-linear feature normalization: No a-priori assumption Component-by-component MORE FLEXIBLE, LESS ACCURATE
30
30 HIWIRE Meeting – Athens, 3 - 4 November, 2005 VTS results on AURORA 4 ExperimentTrain mode Test size WER exp. 01-07 WER exp. 08-14 WER exp. 01-14 BaselineClean16640.53 %50.60 %45.57 % HEQClean16632.19 %42.74 %37.47 % Parametric non-linear EQ Clean16628.78 %34.27 %31.53 % VTSClean16629.46 %37.22 %33.34 % VTS (noise known) Clean16626.97 %32.25 %26.97 % AFEClean16627.57 %34.99 %31.28 % BaselineMulti16624.58 %29.88 %27.23 %
31
31 HIWIRE Meeting – Athens, 3 - 4 November, 2005 Including uncertainty in noise compensation Noise is a random process: we do not know n, but p(n) Then, from an observation y we cannot find x, but p(x|y, x, n ) Usually, compensation procedures provide E[x|y, x, n ] What about uncertainty of x ? Mean and variance of x :
32
32 HIWIRE Meeting – Athens, 3 - 4 November, 2005 Including uncertainty in noise compensation
33
33 HIWIRE Meeting – Athens, 3 - 4 November, 2005 Including uncertainty in noise compensation An approach for the estimation of the variance: Evaluation of HMM Gaussians:
34
34 HIWIRE Meeting – Athens, 3 - 4 November, 2005 Wiener filt. + uncertainty: results on AURORA 2 Preliminary results with Wiener filtering: Results on Aurora 2 with Wiener filtering + uncertainty Train modeWER Set AWER Set BWER Set CAver. WER Wiener Clean 15.75 %15.87 %17.62 %16.17 % Wiener + Uncert. Clean 12.13 %12.90 %13.28 %12.67 % Wiener Multi 8.91 %10.44 %10.95 %9.93 % Wiener + Uncert. Multi 8.87 %10.34 %10.69 %9.82 %
35
35 HIWIRE Meeting – Athens, 3 - 4 November, 2005 Wiener filter + uncertainty: results on AURORA 4 ExperimentTrain mode Test size WER exp. 01-07 WER exp. 08-14 WER exp. 01-14 BaselineClean16640.53 %50.60 %45.57 % HEQClean16632.19 %42.74 %37.47 % Parametric non-linear EQ Clean16628.78 %34.27 %31.53 % VTSClean16629.46 %37.22 %33.34 % Wiener + Uncertainty Clean16627.68 %33.79 %30.74 % AFEClean16627.57 %34.99 %31.28 % BaselineMulti16624.58 %29.88 %27.23 %
36
36 HIWIRE Meeting – Athens, 3 - 4 November, 2005 VTS + uncertainty: formulation VTS based estimation of clean speech: VTS based estimation of variance:
37
37 HIWIRE Meeting – Athens, 3 - 4 November, 2005 Numerical integration of probabilities: formulation Computation of expected values: Numerical integration of expected values:
38
HIWIRE MEETING Athens, November 3-4, 2005 José C. Segura, Ángel de la Torre
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.