Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advances in WP1 Turin Meeting – 9-10 March 2006 www.loquendo.com.

Similar presentations


Presentation on theme: "Advances in WP1 Turin Meeting – 9-10 March 2006 www.loquendo.com."— Presentation transcript:

1 Advances in WP1 Turin Meeting – 9-10 March 2006 www.loquendo.com

2 2 WP1: Environment & Sensor Robustness T1.2 Noise Independence Voice Activity Detection: –Model based approach using NN “Non-linear estimation of voice activity to improve automatic recognition of noisy speech”, Roberto Gemello, Franco Mana and Renato De Mori Eurospeech 2005, Lisboa, September 2005 Noise Reduction: –Spectral Subtraction (standard, Wiener and SNR dependent) and Spectral Attenuation (Ephraim-Malah SA standard and SNR dependent) “Automatic Speech Recognition With a Modified Ephraim-Malah Rule”, Roberto Gemello, Franco Mana and Renato De Mori IEEE Signal Processing Letters, VOL 13, NO 1, January 2006 –Evaluation of HEQ for feature normalization –New techniques for non-stationary noises

3 3 WP1: Environment & Sensor Robustness T1.2 Noise Independence Voice Activity Detection: –A Model-based approach using NN (Neural Networks) to discriminate two classes (noise and voice) will be explored; –NN input could be standard features (Cepstral coeff., Energy) after noise reduction, in case complemented by other features (pitch/voicing) produced by other partners (IRST); –Training set will be multi-style, including several types of noise conditions and languages Noise Reduction: –Some noise reduction techniques will be experimented on the test sets selected as benchmarks for the project: Spectral Subtraction (standard, Wiener and SNR dependent) and Spectral Attenuation (Ephraim-Malah SA standard and SNR dependent) New techniques for non-stationary noises

4 4 WP1: Speech Databases for Noise Reduction Aurora 2 - Connected digits - TIdigits data down sampled to 8 kHz, filtered with a G712 characteristic and noise artificially added at several SNRs (20dB, 15dB, 10 dB, 5dB, 0dB, -5dB). There are three test sets: –A: same noises as in train: subway, babble, car noise, exhibition hall; –B: 4 different noises: restaurant, street, airport, train station; –C: same noises as A but filtered with a different microphone Aurora 3 - Connected digits recorded in car environment - Signal collected by hand free (ch1) and close talk (ch0) microphones. In HIWIRE we use Italian and Spanish recordings. There are two test sets: –WM: ch0 and ch1 recordings used in training and testing lists; –HM: ch0 for training and ch1 for testing Aurora 4 - Continuous speech 5k vocabulary - It is WSJ0 5K with added noise of 6 kinds: Car, Babble, Restaurant, Street, Airport, Train station. It uses the standard Bi-Gram language modeling. CLEAN / MULTI-CONDITIONS training modes.

5 Baseline evaluations of Loquendo ASR on Aurora2 speech databases

6 6 Baseline Performance evaluations Performances in terms of Word Accuracy and (Error Reduction) CLEAN ModelsTest ATest BTest CA-B-C RPLP75.677.575.376.3 + Wiener SNR Dep.84.0(34.4)84.4(30.7)83.3(32.4)84.0(32.5) + EphMal SNR Dep.85.3(39.7)84.2(29.5)84.8(34.5)84.8(35.9) MULTI ModelsTest ATest BTest CA-B-CAvg. RPLP93.591.190.291.984.1 + Wiener SNR Dep.93.9(6.1)92.1(11.2)90.5(3.1)92.5(7.4)88.2(25.8) + EphMal SNR Dep.94.0(7.7)92.0(10.1)91.1(9.2)92.6(8.6)88.7(28.9) LASR ModelsTest ATest BTest CA-B-C RPLP80.983.377.681.2 + Wiener SNR Dep.88.1(37.7)88.3(29.9)86.2(38.4)87.8(35.1) + EphMal SNR Dep.89.0(42.4)88.6(31.7)87.0(41.9)88.4(38.3)

7 Baseline evaluations of Loquendo ASR on Aurora3 speech databases

8 8 Baseline Performance evaluations Performances in terms of Word Accuracy and (Error Reduction) Aurora3 ModelsIta WMIta HMSpa WMSpa HM RPLP98.246.697.374.6 + Wiener SNR dep.98.3 (5.5)77.5 (59.4)97.6 (11.1)89.9 (60.2) + EphMal SNR dep.98.4 (11.1)82.2 (66.7)97.7 (14.8)88.5 (54.7) LASR ModelsIta WMIta HMSpa WMSpa HM RPLP-56.4-79.4 + Wiener SNR dep.-74.6 (41.7)-84.9 (26.6) + EphMal SNR dep.-75.5 (43.8)-86.2 (33.0)

9 Baseline evaluations of Loquendo ASR on Aurora4 speech databases (to be done)

10 10 Baseline Performance evaluations (Sennheiser microphone) This test was performed with the Loquendo ASR with the CLEAN / MULTI_CONDITION models trained using the Aurora4 training list. The test has been done using the CLEAN and the 6 noises testing lists. CLEAN Models CleanCarBabbleRest.StreetAirportTrain Station Noise Avg. RPLP85.254.323.129.434.029.332.333.7 + Wiener SNR dep. 85.2 (0.0) 67.0 (27.8) 36.6 (17.5) 30.7 (1.8) 43.1 (13.8) 31.9 (3.7) 48.8 (24.4) 43.0 (14.0) MULTI Models CleanCarBabbleRest.StreetAirportTrain Station Noise Avg. RPLP84.375.259.958.258.160.957.761.7 + Wiener SNR dep. 84.0 (-1.9) 75.9 (2.8) 60.3 (1.0) 56.8 (-3.3) 60.4 (5.5) 60.5 (-1.0) 62.9 (12.3) 62.8 (2.9)

11 11 Baseline Performance evaluations (second microphone) This test was performed with the Loquendo ASR with the CLEAN / MULTI_CONDITION models trained using the Aurora4 training list. The test has been done using the CLEAN and the 6 noises testing lists. MULTI Models CleanCarBabbleRest.StreetAirportTrain Station Noise Avg. RPLP68.359.045.545.343.348.644.147.6 + Wiener SNR dep. 65.6 (-7.9) 59.6 (1.5) 47.7 (4.0) 44.7 (-1.1) 45.8 (4.4) 47.8 (-1.5) 48.0 (7.0) 48.9 (2.5) CLEAN Models CleanCarBabbleRest.StreetAirportTrain Station Noise Avg. RPLP59.435.716.221.422.919.322.823.1 + Wiener SNR dep. 60.1 (1.7) 50.2 (22.5) 25.7 (11.3) 23.6 (2.8) 29.2 (8.2) 22.6 (4.1) 35.1 (15.9) 31.1 (10.4)

12 12 Baseline Performance evaluations (Loquendo Models) This test was performed with the Loquendo ASR with the Loquendo models trained using MACROPHONE. The test has been done using the CLEAN, the 6 noises testing lists, and both microphones. Sennheiser micro CleanCarBabbleRest.StreetAirportTrain Station Noise Avg. RPLP81.268.143.941.941.043.741.646.7 + Wiener SNR dep. 81.2 (0.0) 74.5 (20.1) 53.7 (17.5) 46.4 (7.7) 53.5 (21.2) 46.9 (5.7) 59.9 (31.3) 55.8 (17.1) Second micro CleanCarBabbleRest.StreetAirportTrain Station Noise Avg. RPLP70.856.235.736.433.638.236.239.4 + Wiener SNR dep. 71.0 (0.7) 65.0 (20.1) 47.2 (17.9) 40.4 (6.3) 46.2 (19.0) 42.5 (7.0) 53.1 (26.5) 49.1 (16.0)

13 13 HEQ Evaluation (1) (Loquendo & UGR) The HEQ algorithm introduces an amplification of the coefficient (energy in this case) in the background noise audio segment.

14 14 HEQ Evaluation (2) (Loquendo & UGR) The HEQ algorithm introduces a context dependent normalization. This could be a drawback for open-vocabulary recognizer where phoneme based acoustic models are used.

15 15 HEQ Integration (3) (Loquendo & UGR) Loquendo FE UGR HEQ Loquendo ASR Denoise (Power Spectrum level) Feature Normalization (Frame -39coeff- level) Phoneme-based Models AURORA3 ITA - HM SAWAWIWDWS Loquendo46.6%77.5%4.8%7.2%10.4% +HEQ12138.2%69.6%4.3%12.6%13.5% +HEQ100146.5%77.7%4.0%7.3%11.0%

16 16 WP1: Workplan Selection of suitable benchmark databases; (m6) Completion of LASR baseline experimentation of Spectral Subtraction (Wiener SNR dependent) (m12) Discriminative VAD (training+AURORA3 testing) (m16) Exprimentation of Spectral Attenuation rule (Ephraim-Malah SNR dependent) (m21) Integration of denoising and normalization techniques (m33) Noise estimation and reduction for non-stationary noises (m33)


Download ppt "Advances in WP1 Turin Meeting – 9-10 March 2006 www.loquendo.com."

Similar presentations


Ads by Google