Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advances in WP1 Nancy Meeting – 6-7 July 2006 www.loquendo.com.

Similar presentations


Presentation on theme: "Advances in WP1 Nancy Meeting – 6-7 July 2006 www.loquendo.com."— Presentation transcript:

1

2 Advances in WP1 Nancy Meeting – 6-7 July 2006 www.loquendo.com

3 2 WP1: Environment & Sensor Robustness T1.2 Noise Independence Noise Reduction: –Spectral Subtraction (YEAR 1) and Spectral Attenuation (YEAR2) “Automatic Speech Recognition With a Modified Ephraim-Malah Rule”, Roberto Gemello, Franco Mana and Renato De Mori IEEE Signal Processing Letters, VOL 13, NO 1, January 2006 –Evaluation of HEQ for feature normalization (HEQ study + Revision 2)

4 3 Denoising Techniques for Y2 evaluations (1) Ephraim–Malah MMSE log estimator rule : Spectral Attenuation (or spectral weighting) is a form of audio signal enhancement in which noise suppression can be viewed as the application of a suppression rule, or non-negative real-valued gain G k, to each bin k of the observed signal magnitude spectrum, in order to form an estimate of the original signal magnitude spectrum.

5 4 Denoising Techniques for Y2 evaluations (2) Modified Ephraim–Malah MMSE log estimator rule: We propose to make the estimation of the a priori and the a posteriori SNR dependent on the noise overestimation factor  (m) and the spectral floor  (m) as follows:

6 5 Denoising Techniques for Y2 evaluations (3) The noise spectrum amplitude is obtained by a first-order recursion in conjunction with an energy based Voice Activity Detector (VAD) as follows: Where: controls the update speed of the recursion (0.9),  controls the allowed dynamics of noise (4.0), and the noise standard deviation  (m) is estimated as:

7 Baseline evaluations of Loquendo ASR on Aurora2 speech databases

8 7 Year 1+2 Performance evaluations Test ATest BTest CA-B-C Avg ModelsCleanMultiCleanMultiCleanMultiCleanMulti ND24.46.522.58.924.79.823.78.1 WM16.0 (34.4) 6.1 (6.1) 15.6 (30.7) 7.9 (11.2) 16.7 (32.4) 9.5 (3.0) 16.0 (32.5) 7.5 (7.4) EMM14.7 (39.7) 6.0 (7.7) 15.8 (29.8) 8.0 (10.1) 15.2 (38.5) 8.9 (9.2) 15.2 (35.9) 7.4 (8.6) The testing conditions used in the experiments are the following: 1) No Denoising (ND): Rasta PLP features (RPLP) are used without any preliminary noise reduction. 2) Wiener modified (WM): RPLP with Wiener filtering dependent on global SNR. 3) Ephraim-Malah modified (EMM): RPLP with noise reduction based on the modified Ephraim-Malah spectral attenuation rule.

9 Baseline evaluations of Loquendo ASR on Aurora3 speech databases

10 9 Year 1+2 Performance evaluations The testing conditions used in the experiments are the following: 1) No Denoising (ND): Rasta PLP features (RPLP) are used without any preliminary noise reduction. 2) Wiener modified (WM): RPLP with Wiener filtering dependent on global SNR. 3) Ephraim-Malah modified (EMM): RPLP with noise reduction based on the modified Ephraim-Malah spectral attenuation rule. Ita WMIta HMSpa WMSpa HM ND1.853.42.725.4 WM1.7 (5.5) 22.5 (57.9) 2.4 (11.1) 10.1 (60.2) EMM1.6 (11.1) 17.8 (66.7) 2.3 (14.8) 11.5 (54.7)

11 Baseline evaluations of Loquendo ASR on Aurora4 speech databases

12 11 Year 1+2 Performance evaluations The testing conditions used in the experiments are the following: 1) No Denoising (ND): Rasta PLP features (RPLP) are used without any preliminary noise reduction. 2) Wiener modified (WM): RPLP with Wiener filtering dependent on global SNR. 3) Ephraim-Malah modified (EMM): RPLP with noise reduction based on the modified Ephraim-Malah spectral attenuation rule. CLEAN Models CLEANCarBabbleRestaurantStreetAirportTrain Station Noise avg. ND14.845.776.970.666.070.767.766.3 WM14.8 (00.0) 33.0 (27.8) 63.4 (17.5) 69.3 (1.8) 56.9 (13.8) 68.1 (3.7) 51.2 (24.4) 57.0 (14.0) EMM14.5 (2.02) 29.6 (35.2) 62.9 (18.2) 68.4 (3.1) 54.2 (17.8) 68.4 (3.2) 46.3 (31.6) 55.0 (17.0)

13 12 Year 1+2 Performance evaluations The testing conditions used in the experiments are the following: 1) No Denoising (ND): Rasta PLP features (RPLP) are used without any preliminary noise reduction. 2) Wiener modified (WM): RPLP with Wiener filtering dependent on global SNR. 3) Ephraim-Malah modified (EMM): RPLP with noise reduction based on the modified Ephraim-Malah spectral attenuation rule. MULTI Models CLEANCarBabbleRestaurantStreetAirportTrain Station Noise avg. ND15.724.840.141.841.939.142.338.3 WM16.6 (-5.7) 24.1 (2.8) 39.7 (1.0) 43.2 (-3.3) 39.6 (5.5) 39.5 (-1.0) 37.1 (12.3) 37.2 (2.9) EMM15.5 (1.3) 24.7 (0.4) 40.4 (-0.7) 44.2 (-5.7) 39.5 (5.7) 40.4 (-3.3) 38.2 (9.7) 37.9 (1.0)

14 HEQ + Denoising techniques

15 14 Problems: (1) Context dependency (whole utterance CDF estimation the best) (2) High variability in background noise segment HEQ Evaluation: Revision 1 (1) (Loquendo & UGR) HEQ (121 ) E+12CEP DE+12DEP DDE+12DDEP (39 coefficients)

16 15 HEQ Integration: Revision 1 (2) (Loquendo & UGR) Loquendo FE UGR HEQ Loquendo ASR Denoise (Power Spectrum level) Feature Normalization (Frame -39coeff- level) Phoneme-based Models AURORA3 ITA - HM SAWAWIWDWS Loquendo46.6%77.5%4.8%7.2%10.4% +HEQ12138.2%69.6%4.3%12.6%13.5% HEQ12137.9%69.1%3.5%13.8%13.5% +HEQ100146.5%77.7%4.0%7.3%11.0%

17 16 HEQ Evaluation: Revision 2 (3) (Loquendo & UGR) HEQ (1573 ) E+12CEP DE+12DEP DDE+12DDEP (39 coefficients) HEQ (1573 ) Benefits: (1) Relation in magnitude and dynamics among coefficients are preserved (2) More stable CDF estimation similar to extend the HEQ temporal window

18 17 HEQ Evaluation: Revision 2 (4) (Loquendo & UGR) AURORA3 ITA - HM SAWAWIWDWS WM46.6%77.5%4.8%7.2%10.4% HEQ12147.9%77.7%5.1%6.7%10.5% HEQ24149.7%79.7%4.3%6.6%9.3% WM+HEQ12149.0%79.2%5.1%5.7%10.0% WM+HEQ24150.8%79.8%4.6%6.1%9.4%

19 18 HEQ for denoising (5) (Loquendo & UGR) Comparing RPLP / HEQrev1 / HEQrev2 using the same clean and noisy signal

20 19 HEQ for signal level equalization (6) (Loquendo & UGR) Comparing RPLP / HEQrev1 / HEQrev2 using the same clean signal at normal gain level and at low gain level

21 20 WP1: Workplan Selection of suitable benchmark databases; (m6) Completion of LASR baseline experimentation of Spectral Subtraction (Wiener SNR dependent) (m12) Discriminative VAD (training+AURORA3 testing) (m16) Exprimentation of Spectral Attenuation rule (Ephraim-Malah SNR dependent) (m21) Preliminary results on spectral subtraction and HEQ techniques (m24) Integration of denoising and normalization techniques (m33) Noise estimation and reduction for non-stationary noises (m33)


Download ppt "Advances in WP1 Nancy Meeting – 6-7 July 2006 www.loquendo.com."

Similar presentations


Ads by Google